To main content

From Code to Concept: A Semantic Approach to AI Innovation Discovery in Open Source Software Repositories

Abstract

Artificial Intelligence (AI) is a transformative force driving innovation, yet tracking AI-related advancements remains challenging due to the rapid pace of development and unstructured data from platforms like GitHub. This paper proposes an AI-driven approach to innovation detection, leveraging GitHub as a data source to systematically identify and link AI projects to organizations. Key contributions include a domain-specific taxonomy comprising 7,490 AI topics, a modular pipeline for semantic annotation and entity linking, and a trend detection framework based on Singular Spectrum Analysis (SSA). A knowledge graph is constructed to represent relationships among AI topics, projects, and companies, thereby enabling structured innovation tracking. The approach addresses challenges such as data sparsity and noise, demonstrating strengths in semantic annotation and topic categorization. Results highlight the potential for accurately detecting AI innovations and linking them to organizational entities, offering valuable insights for researchers, companies, and policymakers. This work contributes a scalable, automated approach for AI innovation tracking, with future directions focusing on refining entity linking and expanding the knowledge graph to capture emerging trends.

Category

Academic article

Language

English

Author(s)

  • Inna Novalija
  • Dumitru Roman
  • Federico Belotti
  • Vladimir Alexiev
  • Luis Rei
  • Roberto Avogadro
  • Babak Khalilvandian
  • Boyan Bechev
  • Catalina Alexandra Chinie
  • Iulia Ciurea
  • Janez Brank
  • Cosmin Udroiu
  • Ahmet Soylu
  • Matteo Palmonari

Affiliation

  • SINTEF Digital
  • Kristiania University of Applied Sciences

Year

2025

Published in

IEEE Access

Volume

13

Page(s)

130014 - 130014

View this publication at Norwegian Research Information Repository