Abstract | ||
---|---|---|
Entity resolution identifies all records in a database that refer to the same entity. The mainstream solutions rely on supervised learning or crowd assistance, both requiring labor overhead for data annotation. To avoid human intervention, we propose an unsupervised graph-theoretic fusion framework with two components, namely ITER and CliqueRank. Specifically, ITER constructs a weighted bipartite graph between terms and record-record pairs and iteratively propagates the node salience until convergence. Subsequently, CliqueRank constructs a record graph to estimate the likelihood of two records resident in the same clique. The derived likelihood from CliqueRank is fed back to ITER to rectify the edge weight until a joint optimum can be reached. Experimental evaluation was conducted among 14 competitors and results show that without any labeled data or crowd assistance, our unsupervised framework is comparable or even superior to state-of-the-art methods among three benchmark datasets. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/ICDE.2018.00070 | 2018 IEEE 34th International Conference on Data Engineering (ICDE) |
Keywords | Field | DocType |
unsupervised entity resolution,random walk,bipartite graph | Convergence (routing),Graph,Data mining,Name resolution,Clique,Computer science,Bipartite graph,Fusion,Supervised learning,Salience (language) | Conference |
ISSN | ISBN | Citations |
1063-6382 | 978-1-5386-5521-4 | 3 |
PageRank | References | Authors |
0.37 | 26 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Dongxiang Zhang | 1 | 743 | 43.89 |
Long Guo | 2 | 65 | 4.17 |
Xiangnan He | 3 | 3064 | 128.86 |
Jie Shao | 4 | 679 | 70.78 |
Sai Wu | 5 | 954 | 59.08 |
Heng Tao Shen | 6 | 6020 | 267.19 |