Title
Entity Matching Across Multiple Heterogeneous Data Sources.
Abstract
Entity matching is the problem of identifying which entities in a data source refer to the same real-world entity in the others. Identifying entities across heterogeneous data sources is paramount to entity profiling, product recommendation, etc. The matching process is not only overwhelmingly expensive for large data sources since it involves all tuples from two or more data sources, but also need to handle heterogeneous entity attributes. In this paper, we design an unsupervised approach, called EMAN, to match entities across two or more heterogeneous data sources. The algorithm utilizes the locality sensitive hashing schema to reduce the candidate tuples and speed up the matching process. To handle the heterogeneous entity attributes, we employ the exponential family to model the similarities between the different attributes. EMAN is highly accurate and efficient even without any ground-truth tuples. We illustrate the performance of EMAN on re-identifying entities from the same data source, as well as matching entities across three real data sources. Our experimental results manifest that our proposed approach outperforms the comparable baseline.
Year
Venue
Field
2016
DASFAA
Locality-sensitive hashing,Data source,Data mining,Computer science,Tuple,Profiling (computer programming),Exponential family,Schema (psychology),Speedup
DocType
Citations 
PageRank 
Conference
4
0.42
References 
Authors
14
5
Name
Order
Citations
PageRank
Chao Kong141.44
Ming Gao2769.41
Chen Xu3314.43
Weining Qian4106481.09
Aoying Zhou52632238.85