Automated Compliance Checking of Data Pipelines Using Multi-Agent LLMs

Ensuring that complex data pipelines adhere to a growing web of legal, ethical, and organizational requirements (e.g., GDPR, AI Act) is a major challenge. This thesis will examine how multi-agent LLMs can be used for the automated compliance checking of data pipelines, exploring a novel application of AI for regulatory and policy assurance in data-intensive systems.

Contact persons

Andres Felipe Ocampo Palacio

Research Scientist
Sagar Sen

Research Manager

The study will include the design of a system where agents parse pipeline specifications, map them to relevant regulatory constraints, and collaboratively reason about compliance violations. The research will involve implementing a prototype that integrates compliance rule sources, pipeline metadata, and multi-agent reasoning strategies. The system will be evaluated on real-world or synthetic data pipelines, assessing the accuracy of compliance detection, scalability, and trustworthiness.

Research Topic Focus

The work to be done includes:

Literature Review: Study existing research on compliance checking for data pipelines, legal AI, and regulatory reasoning with LLMs.
Requirement Gathering: Identify regulatory frameworks and policies relevant for pipelines (e.g., GDPR, AI Act, data locality rules).
Knowledge Representation: Formalize a subset of compliance rules (e.g., data minimization, consent, logging) into machine-interpretable formats.
Agent Design: Define specialized agents such as:
- Rule Extraction Agent (maps text to structured constraints).
- Pipeline Analyzer Agent (extracts pipeline metadata and processes).
- Compliance Checker Agent (matches pipeline behavior with rules).
- Reporting Agent (generates compliance reports and recommendations).
Prototype Development: Implement a working system integrating agents, pipeline metadata, and compliance rules.
Case Studies: Apply the prototype to synthetic and real-world pipeline examples (e.g., bioinformatics workflows, IoT data pipelines).
Evaluation: Assess accuracy, false positives/negatives, scalability, and explainability of compliance assessments.
Discussion: Analyze the results, limitations, and implications of the approach for automated governance.

Expected Results and Learning Outcome

Expected Contribution:
- A functional prototype of a multi-agent system for automated compliance checking.
- A novel methodology and empirical insights into applying LLMs for regulatory assurance in data systems.
Learning Outcomes:
- Practical experience building and evaluating multi-agent LLM systems.
- An interdisciplinary understanding of the intersection between AI, data engineering, and regulatory compliance (Legal Tech).
- Skills in knowledge representation and applying AI for complex reasoning tasks.

Qualifications

Required: Strong programming experience in Python.
Knowledge: A solid understanding of LLMs and fundamental data engineering concepts.
Interest: A keen interest in topics such as data privacy, AI ethics, and regulatory technology (RegTech) is essential for this thesis.
Advantageous: Familiarity with knowledge representation (e.g., ontologies, RDF) or prior exposure to legal or policy documents would be a plus.

References

Sen, S., LexAlign: Towards a Multiagent AI System for Regulatory Compliance of Data/AI Pipelines.

Contact us

Our services

Career

Sustainability

Management and board

Institutes

Other units

About us

Follow us