Who is Who - Name Disambiguation in Large-Scale Scientific Literature. - Citegraph

Paper Info

Title
Who is Who - Name Disambiguation in Large-Scale Scientific Literature.

Abstract
Name ambiguity is a long-standing problem which results in a reduction to the quality of querying scientific literatures. Most previous methods of name disambiguation focus on centralized strategies. However, it is more challenging to deal with the increasing large-scale scientific literature datasets. In this paper, we proposed a distributed approach to solve the name disambiguation problem in large-scale scientific literature datasets. Firstly, an initial multi-relational network is constructed according to the co-authorship in each publication. Secondly, computing similarity between nodes of each pair based on node attributes. Then a logistic regression model is applied to predict the label of edges. Finally, leveraging spectral clustering to repartition the connected components generated from the former stage in the network to get more precise results. We implement the model on distributed Spark framework. The experiment results demonstrate that our approach is scalable and better in performance than baseline methods.

Year	DOI	Venue
2019	10.1109/ICDMW.2019.00150	ICDM Workshops
Field	DocType	Citations
Spectral clustering,Data mining,Scientific literature,Spark (mathematics),Computer science,Connected component,Artificial intelligence,Name disambiguation,Ambiguity,Machine learning,Scalability	Conference	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Hongliang Du	1	0	0.34
Zhiyi Jiang	2	0	0.68
Jianliang Gao	3	106	20.98

1