Abstract | ||
---|---|---|
In entity matching techniques from bibliographic databases like DBLP, Arnetminer, Scopus, Google Scholar, etc., records of the same entity need to be clustered together. Checking the correctness of such techniques is very challenging because actual or correct results are often not known or very difficult to know. Generally, F-measure is used for correctness evaluation, which requires gold standard data. However, obtaining the gold standard data is very difficult, time-consuming and requires the help from some human-annotators. To address this problem, in the current work, we have proposed the use of some internal cluster validity measures to evaluate the goodness of the entity matching techniques. In order to handle the bibliographic databases, several different distance measures are used and these are incorporated into the definitions of various internal cluster validity measures to make them applicable for entity matching problem. Comparative analysis is done on a large collection of data sets considering various internal validity measures. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/ICACCI.2018.8554946 | 2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI) |
Field | DocType | Citations |
Information system,Data set,Bibliographic database,Information retrieval,Computer science,Correctness,Control engineering,Scopus,Internal validity,Cluster analysis,Distance measures | Conference | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sumit Mishra | 1 | 12 | 5.94 |
Sriparna Saha | 2 | 1064 | 106.07 |
Samrat Mondal | 3 | 100 | 18.02 |