Abstract
This paper presents a survey of existing tools for Big Data pipeline orchestration based on a comparative framework developed in the DataCloud project. We propose criteria for evaluating the tools to support reusability, flexible pipeline communication modes, and separa- tion of concerns in Big Data pipeline descriptions. This survey aims to identify research and technological gaps and to recommend approaches for filling them. Further work in the DataCloud project is oriented to- wards the design, implementation, and practical evaluation of the rec- ommended approaches.