Abstract
Public procurement generates over $13 trillion annually, yet data about public buyers and suppliers remains fragmented, inconsistent, and difficult to link across jurisdictions. This paper presents a practical industrial solution
developed by Spend Network within the European project enRichMyData to semantically enrich and reconcile
procurement data at scale. The proposed pipeline combines large language models (LLMs) with knowledge graphs
(KGs) to create and maintain a canonical register of public sector entities. It supports multilingual, cross-border
integration and is designed to serve both public transparency and commercial applications. The pipeline has
been evaluated on a manually curated benchmark of 1,000 procurement-related entities and demonstrates high
precision and scalability in real-world settings.