Entity Matching Across Multiple Heterogeneous Data Sources. - Citegraph

Paper Info

Title
Entity Matching Across Multiple Heterogeneous Data Sources.

Abstract
Entity matching is the problem of identifying which entities in a data source refer to the same real-world entity in the others. Identifying entities across heterogeneous data sources is paramount to entity profiling, product recommendation, etc. The matching process is not only overwhelmingly expensive for large data sources since it involves all tuples from two or more data sources, but also need to handle heterogeneous entity attributes. In this paper, we design an unsupervised approach, called EMAN, to match entities across two or more heterogeneous data sources. The algorithm utilizes the locality sensitive hashing schema to reduce the candidate tuples and speed up the matching process. To handle the heterogeneous entity attributes, we employ the exponential family to model the similarities between the different attributes. EMAN is highly accurate and efficient even without any ground-truth tuples. We illustrate the performance of EMAN on re-identifying entities from the same data source, as well as matching entities across three real data sources. Our experimental results manifest that our proposed approach outperforms the comparable baseline.

Year	Venue	Field
2016	DASFAA	Locality-sensitive hashing,Data source,Data mining,Computer science,Tuple,Profiling (computer programming),Exponential family,Schema (psychology),Speedup
DocType	Citations	PageRank
Conference	4	0.42
References	Authors
14	5

Authors (5 rows)

Cited by (4 rows)

References (14 rows)

Name	Order	Citations	PageRank
Chao Kong	1	4	1.44
Ming Gao	2	76	9.41
Chen Xu	3	31	4.43
Weining Qian	4	1064	81.09
Aoying Zhou	5	2632	238.85

1