Smart Data Placement for Big Data Pipelines: An Approach based on the Storage-as-a-Service Model

Abstract

The development of big data pipelines is a challenging task, especially when data storage is considered as part of the data pipelines. Local storage is expensive, hard to maintain, comes with several challenges (e.g., data availability, data security, and backup). The use of cloud storage, i.e., Storageas-a-Service (StaaS), instead of local storage has the potential of providing more flexibility in terms of such as scalability, fault tolerance, and availability. In this paper, we propose a generic approach to integrate StaaS with data pipelines, i.e., computation on an on-premise server or on a specific cloud, but integration with StaaS, and develop a ranking method for available storage options based on five key parameters: cost, proximity, network performance, the impact of server-side encryption, and user weights. The evaluation carried out demonstrates the effectiveness of the proposed approach in terms of data transfer performance and the feasibility of dynamic selection of a storage option based on four primary user scenarios.

Read the publication

Language

English

Author(s)

Akif Quddus Khan
Nikolay Nikolov
Mihhail Matskin
Radu Prodan
Hui Song
Titi Roman
Ahmet Soylu

Affiliation

SINTEF Digital / Sustainable Communication Technologies
Royal Institute of Technology
University of Klagenfurt (AAU)
Norwegian University of Science and Technology
OsloMet - Oslo Metropolitan University

Year

2022

Publisher

IEEE (Institute of Electrical and Electronics Engineers)

Book

15th IEEE/ACM International Conference on Utility and Cloud Computing

ISBN

9781665460873

Page(s)

317 - 320

DOI

https://doi.org/10.1109/ucc56403.2022.00056

Read fulltext

https://hdl.handle.net/11250/3062532

View this publication at Norwegian Research Information Repository

Contact us

Our services

Career

Sustainability

Management and board

Institutes

Other units

About us

Follow us