Title
UGuide: User-Guided Discovery of FD-Detectable Errors.
Abstract
Error detection is the process of identifying problematic data cells that are different from their ground truth. Functional dependencies (FDs) have been widely studied in support of this process. Oftentimes, it is assumed that FDs are given by experts. Unfortunately, it is usually hard and expensive for the experts to define such FDs. In addition, automatic data profiling over dirty data in order to find correct FDs is known to be a hard problem. In this paper, we propose an end-to-end solution to detect FD-detectable errors from dirty data. The broad intuition is that given a dirty dataset, it is feasible to automatically find approximate FDs, as well as data that is possibly erroneous. Arguably, at this point, only experts can confirm true FDs or true errors. However, in practice, experts never have enough budget to find all errors. Hence, our problem is, given a limited budget of expert's time, which questions we should ask, either FDs, cells, or tuples, such that we can find as many data errors as possible. We present efficient algorithms to interact with the user. Extensive experiments demonstrate that our proposed framework is effective in detecting errors from dirty data.
Year
DOI
Venue
2017
10.1145/3035918.3064024
SIGMOD Conference
Field
DocType
Citations 
Data mining,Computer science,Intuition,Artificial intelligence,Data profiling,Dirty data,Ask price,Tuple,Functional dependency,Error detection and correction,Ground truth,Database,Machine learning
Conference
5
PageRank 
References 
Authors
0.40
24
5
Name
Order
Citations
PageRank
Saravanan Thirumuruganathan117034.15
Laure Berti-Equille258849.90
Mourad Ouzzani31213120.36
Jorge-arnulfo Quiané-ruiz498661.02
Nan Tang595459.62