Semantic Enabled Federated Catalogue Services for Open Data

Vitenskapelig foredrag
Open data fuels the development of new and innovative ICT solutions towards the vision of smart and sustainable cities. More and more public data become available, both historical data and real time data e.g., from sensors. Datasets have been open through various channels. Some are published in portals and catalogues, e.g., in Norway through the national portals data.norge.no and geonorge.no, while others in their own websites.

However, there is currently some challenges to the use of this data. There is a lack of overview of available open data. Often the published data descriptions are unprecise or incomplete and it is time-consuming to browse over a large number of datasets with unprecise or incomplete descriptions to identify relevant datasets. In addition, there is no common data models and Application Programming Interfaces (APIs). This hinders automatic search for software programs to discover and utilize open data dynamically.

To improve this situation and facilitate the development of innovative applications, federated catalogue services are needed to search datasets from different catalogues and find the datasets most suitable to the user needs. Semantic search based on ontologies is a promising approach to increase search quality and efficiency and enable automatic search. Such semantic and automatic search among different catalogues is not available national-wise and internationally.

CKAN is an open source data portal platform widely used in implementing catalogue systems for open government and smart cities data. CKAN does not provide ontology-based semantic search. Therefore, we have implemented a prototype for semantic-enabled federated catalogue services based on CKAN. The lean startup method is followed, with an iterative approach of components development where feedback is collected automatically by mechanisms integrated in the prototype.

The prototype consists of three components:
• A harvester plugin to import dataset descriptions from different catalogues.
• A semantic plugin for management of ontologies, annotation of datasets with ontology concepts, and semantic search based on ontologies. APIs are available for semantic search to enable automatic search.
• A front-end user interface to demonstrate the semantic search including the semantic search results, visualization of ontology graph and annotations.
To verify the idea, transport domain is selected as the application area and three transport-related use cases have been developed to identify the supporting services and relevant data required. As ontologies are vital in this approach, we are defining and elaborating an ontology based on the information models of transport standards and widely used systems e.g., NVDB, INSPIRE, DATEX.

Hackathons are planned in the autumn to allow users and developers from start-ups, SMEs and city authorities to test and experiment with the prototype, and to gather feedbacks on using the prototype and issues related to the opening and use of data, e.g., which data needs to be open, the quality or errors in the data. In this way, we hope to contribute to lowering the barriers for publishing and using open data, and facilitating innovative services for smart cities.
  • Research Council of Norway (RCN) / Open Transport Data - 257153
  • SINTEF Digital / Software Engineering, Safety and Security
NTNU Sustainability Science Conference
18.10.2017 - 20.10.2017