Trends and Challenges in Data Cleaning for Large-Scale Systems: A Survey
DOI:
https://doi.org/10.63412/53kczv76Keywords:
Automation in Data Cleaning, Cloud-Based Data Cleaning, Data Heterogeneity, Data Provenance, Explainable AI, Federated Data Cleaning, IoT Data Cleaning, Machine Learning in Data Cleaning, Metrics for Data Quality, Privacy-Preserving Data Cleaning, Real-Time Data Cleaning, Resource-Efficient Algorithms, Scalability, Standardized Benchmarks, Trends in Data Cleaning, Universal Metrics.Abstract
Data cleaning is a critical process to maintain the integrity and usability of large-scale systems that process massive, diverse, and dynamic datasets. As the scale and complexity of data ecosystems grow, traditional cleaning techniques face limitations in addressing challenges such as data heterogeneity, real-time processing demands, and resource constraints. This paper presents a comprehensive survey of the latest trends and persistent challenges in data cleaning for large-scale systems. It examines advancements in automated and AI-driven methods, distributed and cloud-based cleaning frameworks, and real-time error detection techniques for streaming data. Additionally, the survey highlights domain-specific cleaning approaches in sectors like healthcare and finance, where data quality significantly impacts decision-making and operational efficiency. Key challenges, including scalability bottlenecks, the lack of standardized benchmarks, and ethical considerations, are discussed in detail. Finally, the paper identifies open research directions, such as explainable AI in data cleaning, universal metrics development, and sustainable algorithms for resource-efficient processing. By synthesizing recent developments and emphasizing their role in improving decision-making, system performance, and user experience, this survey aims to guide researchers and practitioners toward innovative solutions for enhancing data quality in large-scale systems.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Global Innovations and Solutions

This work is licensed under a Creative Commons Attribution 4.0 International License.