Title
Circumventing Data Quality Problems Using Multiple Join Paths
Abstract
We propose the Multiple Join Path (MJP) framework for obtaining high quality information by linking fields across multiple databases, when the underlying databases have poor quality data, which are characterized by violations of integrity constraints like keys and functional dependencies within and across databases. MJP asso- ciates quality scores with candidate answers by first scoring indi- vidual data paths between a pair of field values taking into account data quality with respect to specified integrity constraints, and then agglomerating scores across multiple data paths that serve as cor- roborating evidences for a candidate answer. We address the prob- lem of finding the top-few (highest quality) answers in the MJP framework using novel techniques, and demonstrate the utility of our techniques using real data and our Virtual Integration Proto- type testbed.
Year
Venue
Keywords
2006
CleanDB
data quality,integrity constraints,functional dependency
Field
DocType
Citations 
Data mining,Multiple data,Data quality,Testbed,Functional dependency,Data integrity,Virtual integration,Mathematics
Conference
8
PageRank 
References 
Authors
0.83
11
3
Name
Order
Citations
PageRank
Yannis Kotidis11994208.82
Amélie Marian2128077.92
Divesh Srivastava389841191.22