Applying Deep Learning and Semantic Technologies for Data Integration

Master Project
Data integration (or data interoperability) is still a major problem in industry that creates a lot of overhead in digitalization projects. Relevant examples of data integration problems include entity matching, (i.e., linking records to entities), and schema alignment (i.e., aligning types and attributes from multiple sources).
Tailored made solutions are currently implemented and deployed for particular problems, projects or organisations. These solutions are expensive to develop and maintain, and they are not suitable to be generalized to support a larger range of problems and projects.
Large companies such as Amazon, Google, Apple and IBM are applying deep learning algorithms and semantic technologies (e.g., knowledge graphs, ontologies and graph databases) to enable a higher degree of automation for data integration problems. However, these techniques are still not known or adopted by many companies and public institutions.
Research Topic Focus
The goal of this project is to explore the applicability of deep learning techniques and semantic technologies to solve data integration problems in real national or European projects aiming to create digital twins and analytics platforms for different domains such as Energy, Manufacturing, Maritime and Biology.
Expected Results and Learning Outcome
After the thesis is successfully submitted and defended, the student should have a better understanding and practical experience working with deep learning techniques and semantic technologies. The student should also be able to better support IT teams on dealing with data integration problems relevant for companies and public institutions (including academic institutions).
Qualifications
Candidates should have a good understanding on deep learning techniques, data engineering, and semantic technologies. Moreover, it will be recommended programming experience in Python with libraries for data processing (e.g., Pandas, SQLAlchemy, etc.), data analytics (NumPy, Scikit-learn, TensorFlow, PyTorch, etc.) and data visualisation (e.g., Matplotlib, Seaborne, etc.).
Some relevant courses at UiO: TEK5040, IN3060, IN2090, IN5800 and IN3110.
References
Li Y., et al., 2020. Deep entity matching with pre-trained language models.
Li Y., et al., 2021. Deep entity matching: Challenges and opportunities.
Tan W.C., 2021. Deep Data Integration
Weikum G. et al., 2021. Machine knowledge: Creation and curation of comprehensive knowledge bases.