Title
NADEEF: a generalized data cleaning system
Abstract
We present NADEEF, an extensible, generic and easy-to-deploy data cleaning system. NADEEF distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface allows users to specify data quality rules by writing code that implements predefined classes. These classes uniformly define what is wrong with the data and (possibly) how to fix it. We will demonstrate the following features provided by NADEEF. (1) Heterogeneity: The programming interface can be used to express many types of data quality rules beyond the well known CFDs (FDs), MDs and ETL rules. (2) Interdependency: The core algorithms can interleave multiple types of rules to detect and repair data errors. (3) Deployment and extensibility: Users can easily customize NADEEF by defining new types of rules, or by extending the core. (4) Metadata management and data custodians: We show a live data quality dashboard to effectively involve users in the data cleaning process.
Year
DOI
Venue
2013
10.14778/2536274.2536280
PVLDB
Keywords
Field
DocType
repair data error,programming interface,customize nadeef,generalized data,data quality rule,live data quality dashboard,etl rule,nadeef distinguishes,easy-to-deploy data,core algorithm,data custodian,dashboard
Interdependence,Data mining,Data quality,Software deployment,Programming language,Computer science,Data type,Dashboard (business),Extensibility,Metadata management,Database,Generality
Journal
Volume
Issue
ISSN
6
12
2150-8097
Citations 
PageRank 
References 
14
0.77
6
Authors
7
Name
Order
Citations
PageRank
Amr Ebaid11144.94
Ahmed K. Elmagarmid23720626.92
Ihab F. Ilyas32907117.27
Mourad Ouzzani41213120.36
Jorge-arnulfo Quiané-ruiz598661.02
Nan Tang695459.62
Si Yin7855.61