Abstract | ||
---|---|---|
Name ambiguity is a long-standing problem which results in a reduction to the quality of querying scientific literatures. Most previous methods of name disambiguation focus on centralized strategies. However, it is more challenging to deal with the increasing large-scale scientific literature datasets. In this paper, we proposed a distributed approach to solve the name disambiguation problem in large-scale scientific literature datasets. Firstly, an initial multi-relational network is constructed according to the co-authorship in each publication. Secondly, computing similarity between nodes of each pair based on node attributes. Then a logistic regression model is applied to predict the label of edges. Finally, leveraging spectral clustering to repartition the connected components generated from the former stage in the network to get more precise results. We implement the model on distributed Spark framework. The experiment results demonstrate that our approach is scalable and better in performance than baseline methods. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ICDMW.2019.00150 | ICDM Workshops |
Field | DocType | Citations |
Spectral clustering,Data mining,Scientific literature,Spark (mathematics),Computer science,Connected component,Artificial intelligence,Name disambiguation,Ambiguity,Machine learning,Scalability | Conference | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hongliang Du | 1 | 0 | 0.34 |
Zhiyi Jiang | 2 | 0 | 0.68 |
Jianliang Gao | 3 | 106 | 20.98 |