Data Locality for Big Data Pipelines on the Computing Continuum
This thesis will look into a novel software architectures for container-centric big data workflow orchestration systems that will take into account data locality information by default, and explore the use of software containers in this context.
With the advancement of the edge computing paradigm, processing of data and associated data pipelines happen on heterogeneous, geographically distributed infrastructure, which makes it necessary to have software solutions that are capable of scheduling the processing of data in a way to reduce data transfers over long distances (data locality). Existing big data solutions are limited in their ability to handle data locality, are inefficient to process small, frequent events, specific to edge environments.
This thesis will build upon a 2021 thesis by Alin Corodescu: thesis, presentation.