To main content

Building a Canonical Register of Public Sector Entities: Semantic Linking of Procurement Data at Scale

Abstract

Public procurement generates over $13 trillion annually, yet data about public buyers and suppliers remains fragmented, inconsistent, and difficult to link across jurisdictions. This paper presents a practical industrial solution developed by Spend Network within the European project enRichMyData to semantically enrich and reconcile procurement data at scale. The proposed pipeline combines large language models (LLMs) with knowledge graphs (KGs) to create and maintain a canonical register of public sector entities. It supports multilingual, cross-border integration and is designed to serve both public transparency and commercial applications. The pipeline has been evaluated on a manually curated benchmark of 1,000 procurement-related entities and demonstrates high precision and scalability in real-world settings.
Read the publication

Category

Academic article

Language

English

Author(s)

Affiliation

  • SINTEF Digital / Sustainable Communication Technologies
  • United Kingdom
  • Kristiania University of Applied Sciences

Date

01.01.2025

Year

2025

Published in

CEUR Workshop Proceedings

Volume

4085

View this publication at Norwegian Research Information Repository