Title
On Validation of Clustering Techniques for Bibliographic Databases
Abstract
In entity name disambiguation, performance evaluation of any approach is difficult. This is due to the fact that correct or actual results are often not known. Generally for evaluation purpose, three measures namely precision, recall and f-measure are used. They all are external validity indices because they need golden standard data. But in Bibliographic databases like DBLP, Arnetminer, Scopus, Web of Science, Google Scholar, etc., gold standard data is not easily available and it is very difficult to obtain this due to the overlapping nature of data. So, there is a need to use some other matrices for evaluation purpose. In this paper, some internal cluster validity index based schemes are proposed for evaluating entity name disambiguation algorithms when applied on bibliographic data without using any gold standard datasets. Two new internal validity indices are also proposed in the current paper for this purpose. Experimental results shown on seven bibliographic datasets reveal that proposed internal cluster validity indices are able to compare the results obtained by different methods without prior/gold standard. Thus the present paper demonstrates a novel way of evaluating any entity matching algorithm for bibliographic datasets without using any prior/gold standard information.
Year
DOI
Venue
2014
10.1109/ICPR.2014.543
Pattern Recognition
Keywords
Field
DocType
bibliographic systems,database management systems,pattern clustering,DBLP,Scopus,Web-of-science,arnetminer,bibliographic databases,clustering technique validation,disambiguation algorithms,entity matching algorithm,external validity indices,f-measure,google scholar,performance evaluation
Information system,Data mining,Information retrieval,Cluster validity index,Computer science,Scopus,Internal validity,Cluster analysis,External validity,Name disambiguation,Database,Blossom algorithm
Conference
ISSN
Citations 
PageRank 
1051-4651
2
0.35
References 
Authors
13
3
Name
Order
Citations
PageRank
Sumit Mishra130.70
Sriparna Saha21064106.07
Samrat Mondal310018.02