To main content

Big data workflows: Locality-aware orchestration using software containers

Abstract

The emergence of the Edge computing paradigm has shifted data processing from centralised infrastructures to heterogeneous and geographically distributed infrastructures. Therefore, data processing solutions must consider data locality to reduce the performance penalties from data transfers among remote data centres. Existing Big Data processing solutions provide limited support for handling data locality and are inefficient in processing small and frequent events specific to the Edge environments. This article proposes a novel architecture and a proof-of-concept implementation for software container-centric Big Data workflow orchestration that puts data locality at the forefront. The proposed solution considers the available data locality information, leverages long-lived containers to execute workflow steps, and handles the interaction with different data sources through containers. We compare the proposed solution with Argo Workflows and demonstrate a significant performance improvement in the execution speed for processing the same data units. Finally, we carry out experiments with the proposed solution under different configurations and analyze individual aspects affecting the performance of the overall solution.

Category

Academic article

Client

  • Research Council of Norway (RCN) / 309691
  • EC/H2020 / 101016835

Language

English

Author(s)

Affiliation

  • University of Oslo
  • SINTEF Digital / Software and Service Innovation
  • Norwegian University of Science and Technology
  • OsloMet - Oslo Metropolitan University
  • Royal Institute of Technology

Date

08.12.2021

Year

2021

Published in

Sensors

ISSN

1424-8220

Publisher

MDPI

Volume

21

Issue

24

Page(s)

1 - 27

View this publication at Cristin