Abstract
Ensuring the legal compliance of data pipelines with evolving EU legislation, such as the AI Act, presents a significant challenge due to the complexity of both technical infrastructures and regulatory texts. This paper explores the potential of Large Language Models (LLMs) to support automated compliance assessment of data pipelines, and proposes an approach that leverages LLM-based agents to extract, label, and assess data pipeline artifacts against relevant legal requirements, guided by the actor’s role and the system’s risk level. By decomposing the assessment process into modular agent tasks, we mitigate token limitations and enable fine-grained analysis of regulatory obligations. The approach is supported by a prototype implementation that integrates outputs from SIM-PIPE, a tool for simulating and analyzing big data pipelines, with a structured interpretation of the regulatory document, e.g., the AI Act. The implementation demonstrates the feasibility of intelligent, scalable compliance auditing and highlights key challenges related to trust, context interpretation, and output validity. We argue that such LLM-powered tools can play a critical role in advancing compliance-by-design practices for legally aligned data-driven systems.