To main content

Validation of entity linking algorithm over different data sources for generalization


Contact person

Keywords: Entity linking, Data enrichment, Tabular data, Knowledge Graphs (KGs), Evaluation

Entity linking is a crucial task in Natural Language Processing (NLP) and Information Retrieval (IR) [Shen et al 2021]. It involves associating specific strings of text (also known as mentions or entities) with the corresponding entries in a knowledge graph or database. This process aids systems in discerning the exact identity that a named entity refers to within a given context, especially when possibilities of ambiguity are present. For instance, a text mention of "Paris" may refer to either "Paris, France" or "Paris, Texas," and entity linking helps clarify such ambiguity [Yin et al 2019].

In the context of structured data such as tables, entity linking involves associating specific cells (entities) with the appropriate entries in a knowledge graph. It requires the identification and linking of entities within tabular data to specific entries in an external structured database. This process is pivotal in making structured data more meaningful and comprehensible by enriching it with external information and context. For instance, in a table of movies, a cell containing "The Matrix" would be linked to the corresponding entry in a knowledge graph, providing additional details about the film, like its director, release date, etc.

Building upon the existing solution available at https://github.com/roby-avo/alligator and https://bitbucket.org/disco_unimib/lamapi, the objective is to validate its functionality against various Knowledge Graphs (KGs), such as Wikidata, DBpedia, and Crunchbase. The aim is to confirm its effectiveness and extend its applicability to support other well-known KGs.

Work to be done:

  • Investigate which are the most popular KGs.
  • Integrate different Knowledge Graphs (KGs) into LamAPI.
  • Validate the proposed solution across the various integrated KGs.

References:

[Shen et al 2021] Shen, W., Li, Y., Liu, Y., Han, J., Wang, J., & Yuan, X. (2021). Entity linking meets deep learning: Techniques and solutions. IEEE Transactions on Knowledge and Data Engineering.

[Yin et al 2019] Yin, X., Huang, Y., Zhou, B., Li, A., Lan, L., & Jia, Y. (2019). Deep entity linking via eliminating semantic ambiguity with BERT. IEEE Access, 7, 169434-169445.