Title
One-Pass Inconsistency Detection Algorithms For Big Data
Abstract
Data in the real world is often dirty. Inconsistency is an important kind of dirty data; before repairing inconsistency, we need to detect them first. The time complexities of the current inconsistency detection algorithms are super-linear to the size of data and not suitable for the big data. For the inconsistency detection of big data, we develop an algorithm that detects inconsistency within the one-pass scan of the data according to both the functional dependency (FD) and the conditional functional dependency (CFD) in our previous work. In this paper, we propose inconsistency detection algorithms in terms of FD, CFD, and Denial Constraint (DC). DCs are more expressive than FDs and CFDs. Developing the algorithm to detect the violation of DCs increases the applicability of our inconsistency detection algorithms. We compare the performance of our algorithm with the performance of implementing SQL queries in MySQL and BigQuery. The experimental results indicate the high efficiency of our algorithms.
Year
DOI
Venue
2016
10.1109/ACCESS.2019.2898707
IEEE ACCESS
Keywords
Field
DocType
Inconsistency detection, big data, one-pass algorithm, data quality, denial constraint
Data mining,Data quality,Computer science,Algorithm,Functional dependency,Dirty data,Big data
Conference
Volume
ISSN
Citations 
7
2169-3536
0
PageRank 
References 
Authors
0.34
7
4
Name
Order
Citations
PageRank
Meifan Zhang101.69
Hongzhi Wang242173.72
Jianzhong Li33196304.46
Hong Gao41086120.07