Title
Schema matching on streams with accuracy guarantees
Abstract
We address the problem of matching imperfectly documented schemas of data streams and large databases. Instance-level schema matching algorithms identify likely correspondences between attributes by quantifying the similarity of their corresponding values. However, exact calculation of these similarities requires processing of all database records - which is infeasible for data streams. We devise a fast matching algorithm that uses only a small sample of records, and is yet guaranteed to find a matching that is a close approximation of the matching that would be obtained if the entire stream were processed. The method can be applied to any given (combination of) similarity metrics that can be estimated from a sample with bounded error; we apply the algorithm to several metrics. We give a rigorous proof of the method's correctness and report on experiments using large databases.
Year
DOI
Venue
2008
10.3233/IDA-2008-12302
Intell. Data Anal.
Keywords
Field
DocType
accuracy guarantee,database record,similarity metrics,large databases,small sample,schema matching,instance-level schema,bounded error,close approximation,corresponding value,data stream,fast matching algorithm
Data mining,Data stream mining,Optimal matching,Computer science,Correctness,3-dimensional matching,STREAMS,Schema matching,Schema (psychology),Blossom algorithm
Journal
Volume
Issue
ISSN
12
3
1088-467X
Citations 
PageRank 
References 
3
0.51
18
Authors
3
Name
Order
Citations
PageRank
Szymon Jaroszewicz135226.73
Lenka Ivantysynova2354.56
Tobias Scheffer31862139.64