Entity linking over tabular data using LLMs
Keywords: Entity linking, Data enrichment, Tabular data, LLMs, ChatGPT
Entity linking is a crucial task in Natural Language Processing (NLP) and Information Retrieval (IR) [Shen et al 2021]. It involves associating specific strings of text (also known as mentions or entities) with the corresponding entries in a knowledge graph or database. This process aids systems in discerning the exact identity that a named entity refers to within a given context, especially when possibilities of ambiguity are present. For instance, a text mention of "Paris" may refer to either "Paris, France" or "Paris, Texas," and entity linking helps clarify such ambiguity [Yin et al 2019].
In the context of structured data, such as tables, entity linking expands to link specific cells (entities) with corresponding entries in a knowledge graph. This entails identifying entities within the tabular data and linking them to specific entries in an external structured database. This essential process enriches structured data with external information and context, making it more meaningful and comprehensive. For instance, in a table of movies, the entity "The Matrix" would be linked to its corresponding entry in a knowledge graph, thereby offering additional information like the film's director, release date, and more.
Given the increasing popularity and impressive results of Language Model Libraries (LLMs), and their potential for solving entity link tasks [Shi et al 2023], this thesis aims to evaluate their efficacy in performing entity linking for structured data, along with the associated time costs. The objective of this work is to leverage existing solutions, like GPT, and attempt to develop a custom, homemade model.
Work to be done:
- Try exiting solution like ChatGPT and see how they work in this context.
- Propose a LLMs open source solution that can work well.
- Validate proposed solution and especially do a comparison between standard approaches vs LLMs based approaches
[Shen et al 2021] Shen, W., Li, Y., Liu, Y., Han, J., Wang, J., & Yuan, X. (2021). Entity linking meets deep learning: Techniques and solutions. IEEE Transactions on Knowledge and Data Engineering.
[Yin et al 2019] Yin, X., Huang, Y., Zhou, B., Li, A., Lan, L., & Jia, Y. (2019). Deep entity linking via eliminating semantic ambiguity with BERT. IEEE Access, 7, 169434-169445.
[Shi et al 2023] Shi, S., Xu, Z., Hu, B., & Zhang, M. (2023). Generative Multimodal Entity Linking. arXiv preprint arXiv:2306.12725.