To main content

Edge-based Data Profiling and Repair as a Service for IoT

Abstract

With the proliferation of IoT devices and the consequent exponential growth in data generation, ensuring data quality has become a critical challenge in IoT applications. Erroneous data can significantly impact the reliability and effectiveness of decision-making processes and downstream analytics. Leveraging the computational abilities of edge devices enables data profiling and repair tasks at the edge, allowing for immediate remediation of erroneous data within the data stream and improved scalability through distributed repair across multiple edge devices. Cloud-based data profiling and repair methods have been extensively researched, but limited computational resources constrain their applicability at edge/fog devices. To overcome this limitation and enhance generalizability, Machine Learning (ML) offers a promising solution, allowing sensor substitution, missing value prediction, and corrupt data replacement. ML-based data repair techniques can be flexibly deployed at the edge using containerized repair services for real-time data repair. In this paper, we propose and assess EDPRaaS (Edge-based Data Profiling and Repair as a Service), a novel approach designed for efficient data quality profiling and repair in IoT environments. EDPRaaS incorporates an ML-based data repair component, enabling real-time data repair at the edge. It leverages pandas profiling and Great Expectations tools for data profiling, providing comprehensive insights into the dataset and detecting data quality issues.
Read the publication

Category

Academic chapter

Language

English

Author(s)

Affiliation

  • SINTEF Digital / Sustainable Communication Technologies
  • Austria

Year

2024

Publisher

Association for Computing Machinery (ACM)

Book

IoT '23: Proceedings of the 13th International Conference on the Internet of Things

ISBN

9798400708541

Page(s)

17 - 24

View this publication at Norwegian Research Information Repository