Automated Compliance Checking of Data Pipelines Using Multi-Agent LLMs
Ensuring that complex data pipelines adhere to a growing web of legal, ethical, and organizational requirements (e.g., GDPR, AI Act) is a major challenge. This thesis will examine how multi-agent LLMs can be used for the automated compliance checking of data pipelines, exploring a novel application of AI for regulatory and policy assurance in data-intensive systems.
Contact persons

The study will include the design of a system where agents parse pipeline specifications, map them to relevant regulatory constraints, and collaboratively reason about compliance violations. The research will involve implementing a prototype that integrates compliance rule sources, pipeline metadata, and multi-agent reasoning strategies. The system will be evaluated on real-world or synthetic data pipelines, assessing the accuracy of compliance detection, scalability, and trustworthiness.
Research Topic Focus
The work to be done includes:
- Literature Review: Study existing research on compliance checking for data pipelines, legal AI, and regulatory reasoning with LLMs.
- Requirement Gathering: Identify regulatory frameworks and policies relevant for pipelines (e.g., GDPR, AI Act, data locality rules).
- Knowledge Representation: Formalize a subset of compliance rules (e.g., data minimization, consent, logging) into machine-interpretable formats.
- Agent Design: Define specialized agents such as:
- Rule Extraction Agent (maps text to structured constraints).
- Pipeline Analyzer Agent (extracts pipeline metadata and processes).
- Compliance Checker Agent (matches pipeline behavior with rules).
- Reporting Agent (generates compliance reports and recommendations).
- Prototype Development: Implement a working system integrating agents, pipeline metadata, and compliance rules.
- Case Studies: Apply the prototype to synthetic and real-world pipeline examples (e.g., bioinformatics workflows, IoT data pipelines).
- Evaluation: Assess accuracy, false positives/negatives, scalability, and explainability of compliance assessments.
- Discussion: Analyze the results, limitations, and implications of the approach for automated governance.
Expected Results and Learning Outcome
- Expected Contribution:
- A functional prototype of a multi-agent system for automated compliance checking.
- A novel methodology and empirical insights into applying LLMs for regulatory assurance in data systems.
- Learning Outcomes:
- Practical experience building and evaluating multi-agent LLM systems.
- An interdisciplinary understanding of the intersection between AI, data engineering, and regulatory compliance (Legal Tech).
- Skills in knowledge representation and applying AI for complex reasoning tasks.
Qualifications
- Required: Strong programming experience in Python.
- Knowledge: A solid understanding of LLMs and fundamental data engineering concepts.
- Interest: A keen interest in topics such as data privacy, AI ethics, and regulatory technology (RegTech) is essential for this thesis.
- Advantageous: Familiarity with knowledge representation (e.g., ontologies, RDF) or prior exposure to legal or policy documents would be a plus.
References
- Sen, S., LexAlign: Towards a Multiagent AI System for Regulatory Compliance of Data/AI Pipelines.