To main content

Data Science: Data prep | Data pipelines

Data science MSc topics related to intelligent data preparation and support for data pipelines.

Photo: Mikael Blomkvist /
  • Data preparation (also known as data prep, wrangling, transformation) is the most time consuming phase in data science projects – potential MSc theses topics are related to exploring and finding intelligent mechanisms to support data scientists in the data preparation phase, examples including: use of machine learning for intelligently suggesting data transformations, automated data quality assessment and recommendations for improving data quality, application of semantics and formal reasoning in the data preparation phase, support for data extension, enrichment and interlinking, use of (knowledge) graph representation and analytics techniques in data preparation.
  • Data pipelines are composite pipelines for processing data with non-trivial properties and characteristics (commonly referred to as the Vs of Big Data, e.g. volume, velocity, variety, etc) and form the backbone of data science projects – potential MSc theses topics are related to exploring and finding mechanisms that support both data scientist and domain experts in the complete lifecycle of managing data pipelines, covering AI-driven data pipeline discovery, languages for data pipelines modeling, and techniques for simulation, deployment, and adaptation of data pipelines. Of particular interest could be support for data pipelines on the Computing Continuum (how heterogenous infrastructures such as cloud, fog, edge could be used to support the complete data pipeline lifecycle).

The above topics will be implemented in exciting domains and various contexts such as Digital Twins, Smart Cities, Industry 4.0, etc.

Contact & Questions