To main content

Scalable Execution of Big Data Workflows using Software Containers

Abstract

Big Data processing involves handling large and complex data sets, incorporating different tools and frameworks as well as other processes that help organisations make sense of their data collected from various sources. This set of operations, referred to as Big Data workflows, require taking advantage of the elasticity of cloud infrastructures for scalability. In this paper, we present the design and prototype implementation of a Big Data workflow approach based on the use of software container technologies and message-oriented middleware (MOM) to enable highly scalable workflow execution. The approach is demonstrated in a use case together with a set of experiments that demonstrate the practical applicability of the proposed approach for the scalable execution of Big Data workflows. Furthermore, we present a scalability comparison of our proposed approach with that of Argo Workflows - one of the most prominent tools in the area of Big Data workflows.

Category

Academic chapter

Language

English

Author(s)

Affiliation

  • SINTEF Digital / Sustainable Communication Technologies
  • Royal Institute of Technology

Year

2020

Publisher

ACM Publications

Book

MEDES '20: Proceedings of the 12th International Conference on Management of Digital EcoSystems

ISBN

9781450381154

Page(s)

76 - 83

View this publication at Norwegian Research Information Repository