Title
Who is Who - Name Disambiguation in Large-Scale Scientific Literature.
Abstract
Name ambiguity is a long-standing problem which results in a reduction to the quality of querying scientific literatures. Most previous methods of name disambiguation focus on centralized strategies. However, it is more challenging to deal with the increasing large-scale scientific literature datasets. In this paper, we proposed a distributed approach to solve the name disambiguation problem in large-scale scientific literature datasets. Firstly, an initial multi-relational network is constructed according to the co-authorship in each publication. Secondly, computing similarity between nodes of each pair based on node attributes. Then a logistic regression model is applied to predict the label of edges. Finally, leveraging spectral clustering to repartition the connected components generated from the former stage in the network to get more precise results. We implement the model on distributed Spark framework. The experiment results demonstrate that our approach is scalable and better in performance than baseline methods.
Year
DOI
Venue
2019
10.1109/ICDMW.2019.00150
ICDM Workshops
Field
DocType
Citations 
Spectral clustering,Data mining,Scientific literature,Spark (mathematics),Computer science,Connected component,Artificial intelligence,Name disambiguation,Ambiguity,Machine learning,Scalability
Conference
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Hongliang Du100.34
Zhiyi Jiang200.68
Jianliang Gao310620.98