Building a Canonical Register of Public Sector Entities: Semantic Linking of Procurement Data at Scale

Abstract

Public procurement generates over $13 trillion annually, yet data about public buyers and suppliers remains fragmented, inconsistent, and difficult to link across jurisdictions. This paper presents a practical industrial solution developed by Spend Network within the European project enRichMyData to semantically enrich and reconcile procurement data at scale. The proposed pipeline combines large language models (LLMs) with knowledge graphs (KGs) to create and maintain a canonical register of public sector entities. It supports multilingual, cross-border integration and is designed to serve both public transparency and commercial applications. The pipeline has been evaluated on a manually curated benchmark of 1,000 procurement-related entities and demonstrates high precision and scalability in real-world settings.

Read the publication

Language

English

Author(s)

Roberto Avogadro
Ian Makgill
Aleena Thomas
Ahmet Soylu
Titi Roman

Affiliation

SINTEF Digital / Sustainable Communication Technologies
United Kingdom
Kristiania University of Applied Sciences

Year

2025

Published in

CEUR Workshop Proceedings

Volume

4085

DOI

https://ceur-ws.org/vol-4085

View this publication at Norwegian Research Information Repository

Contact us

Our services

Career

Sustainability

Management and board

Institutes

Other units

About us

Follow us