To main content

Innovation Knowledge Graph for understanding AI innovation lifecycle

The main goal of the thesis is to build an “Innovation Knowledge Graph” by joining several independent streaming data-sources constituting the main phases of the innovation lifecycle. The aim is to interconnect, model and understand the relevant aspects of the global innovation ecosystem from the inception of the ideas to their potential realization in business, up to the decision making at the policy level.

Contact person

Keywords: Knowledge graphs, AI, datasets, data pipelines

In particular, the resulting knowledge graph should support the narrative of stages which could be understand as a “journey of an innovation” throughout the innovation lifecycle:

(1) An innovation typically appears in the academic world; (2) projects are started around the innovation; (3) the innovation gets possibly patented; (4) companies are established around the innovation; (5) companies get investments, possibly in several rounds; (6) investments influence the job market; (7) market reacts to the quality and possible impact of the innovation; (8) public and expert perception gets formed; (9) media starts publishing about the innovation and companies; (10) educational institutions integrate innovation in their curricula, (11) policy makers regulate the innovation; and (12) to close the cycle, funding agencies create new funding opportunities to create space for follow-up innovations. Each of the above stages has its stakeholders (from scientists, to policy makers) which contribute to the “journey” in their specific ways. The aim of the thesis is to create a holistic model (in a form of an evolving knowledge graph) of the innovation ecosystem, identify interdependencies and influences between the stages, and provide a software prototype that could be evaluated by the stakeholders involved in the innovation process. Due to the breadth of the global innovation ecosystem, the scope of the thesis will focus on a narrower field of AI spreading horizontally across many fields of research, technology and impacting society in various ways.

The Innovation Knowledge Graph will have to encode concepts and their relations on multiple levels of abstraction, in a temporal fashion, and be designed to allow analytics tasks (e.g., prediction, causal modelling). The nodes of such an Innovation Knowledge Graph will have to encode both explicit concepts (e.g., expressible with WikiData concepts) or implicit concepts (e.g., created via statistical models or embeddings extracted from data). The nodes will be connected via explicit edge relations or via an implicit logic formula or a probabilistic model, deciding on the existence of a connection between nodes.

Examples of key data sources to be enriched and interconnected as part of the Knowledge Graph include:

  1. Academic publications (e.g., Microsoft Academic Graph, OpenAlex, LENS)
  2. Patents (e.g., Microsoft Academic Graph, Google Patents, LENS)
  3. Projects (e.g., publicly funded projects from EU, US, etc.)
  4. Companies and investments (e.g., CrunchBase, PDL database)
  5. Job market (demand side) (e.g., EURAXESS,);
  6. Mainstream media (
  7. Economic indicators (e.g., WorldBank, OECD)
  8. Policy documents (e.g., OECD database);
  9. Public perception (e.g., Twitter/X).

Work to be done:

  • Design the innovation knowledge graph.
  • Identify and assess relevant data sources.
  • Integrate data from the selected data sources and validate the knowledge graph.