Title
On Evaluation Of Entity Matching Techniques For Bibliographic Database
Abstract
In entity matching techniques from bibliographic databases like DBLP, Arnetminer, Scopus, Google Scholar, etc., records of the same entity need to be clustered together. Checking the correctness of such techniques is very challenging because actual or correct results are often not known or very difficult to know. Generally, F-measure is used for correctness evaluation, which requires gold standard data. However, obtaining the gold standard data is very difficult, time-consuming and requires the help from some human-annotators. To address this problem, in the current work, we have proposed the use of some internal cluster validity measures to evaluate the goodness of the entity matching techniques. In order to handle the bibliographic databases, several different distance measures are used and these are incorporated into the definitions of various internal cluster validity measures to make them applicable for entity matching problem. Comparative analysis is done on a large collection of data sets considering various internal validity measures.
Year
DOI
Venue
2018
10.1109/ICACCI.2018.8554946
2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI)
Field
DocType
Citations 
Information system,Data set,Bibliographic database,Information retrieval,Computer science,Correctness,Control engineering,Scopus,Internal validity,Cluster analysis,Distance measures
Conference
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Sumit Mishra1125.94
Sriparna Saha21064106.07
Samrat Mondal310018.02