Title
SCODED: Statistical Constraint Oriented Data Error Detection
Abstract
Statistical Constraints (SCs) play an important role in statistical modeling and analysis. This paper brings the concept to data cleaning and studies how to leverage SCs for error detection. SCs provide a novel approach that has various application scenarios and works harmoniously with downstream statistical modeling. Entailment relationships between SCs and integrity constraints provide analytical insight into SCs. We develop SCODED, an SC-Oriented Data Error Detection system, comprising two key components: (1) SC Violation Detection : checks whether an SC is violated on a given dataset, and (2) Error Drill Down : identifies the top-k records that contribute most to the violation of an SC. Experiments on synthetic and real-world data show that SCs are effective in detecting data errors that violate them, compared to state-of-the-art approaches.
Year
DOI
Venue
2020
10.1145/3318464.3380568
SIGMOD/PODS '20: International Conference on Management of Data Portland OR USA June, 2020
DocType
ISBN
Citations 
Conference
978-1-4503-6735-6
1
PageRank 
References 
Authors
0.35
39
5
Name
Order
Citations
PageRank
Jing Nathan Yan122.05
Oliver Schulte213425.15
Mohan Zhang313.73
Jiannan Wang4110945.38
Reynold Cheng53069154.13