To main content

Smart Data Placement for Big Data Pipelines: An Approach based on the Storage-as-a-Service Model

Abstract

The development of big data pipelines is a challenging task, especially when data storage is considered as part of the data pipelines. Local storage is expensive, hard to maintain, comes with several challenges (e.g., data availability, data security, and backup). The use of cloud storage, i.e., Storageas-a-Service (StaaS), instead of local storage has the potential of providing more flexibility in terms of such as scalability, fault tolerance, and availability. In this paper, we propose a generic approach to integrate StaaS with data pipelines, i.e., computation on an on-premise server or on a specific cloud, but integration with StaaS, and develop a ranking method for available storage options based on five key parameters: cost, proximity, network performance, the impact of server-side encryption, and user weights. The evaluation carried out demonstrates the effectiveness of the proposed approach in terms of data transfer performance and the feasibility of dynamic selection of a storage option based on four primary user scenarios.
Read the publication

Category

Academic chapter

Language

English

Author(s)

Affiliation

  • SINTEF Digital / Sustainable Communication Technologies
  • Royal Institute of Technology
  • University of Klagenfurt (AAU)
  • Norwegian University of Science and Technology
  • OsloMet - Oslo Metropolitan University

Year

2022

Publisher

IEEE (Institute of Electrical and Electronics Engineers)

Book

15th IEEE/ACM International Conference on Utility and Cloud Computing

ISBN

9781665460873

Page(s)

317 - 320

View this publication at Norwegian Research Information Repository