Title
GAEMTBD: Genetic algorithm based entity matching techniques for bibliographic databases.
Abstract
Entity matching is to map the records in a database to their corresponding entities. It is a well-known problem in the field of database and artificial intelligence. In digital libraries such as DBLP, ArnetMiner, Google Scholar, Scopus, Web of Science, AllMusic, IMDB, etc., some of the attributes may evolve over time, i.e., they change their values at different instants of time. For example, affiliation and email-id of an author in bibliographic databases which maintain publication details of various authors like DBLP, ArnetMiner, etc. may change their values. A taxpayer can change his or her address over time. Sometimes people change their surnames due to marriage. When a database contains records of these natures and the number of records grows beyond a limit, then it becomes really challenging to identify which records belong to which entity due to the lack of a proper key. In the current paper, the problem of automatic partitioning of records is posed as an optimization problem. Thereafter, a genetic algorithm based automatic technique is proposed to solve the entity matching problem. The proposed approach is able to automatically determine the number of partitions available in a bibliographic dataset. A comparative analysis with the two existing systems --- DBLP and ArnetMiner, over sixteen bibliographic datasets proves the efficacy of the proposed approach.
Year
DOI
Venue
2017
10.1007/s10489-016-0874-z
Appl. Intell.
Keywords
Field
DocType
Entity matching,Genetic algorithm,Cluster validity index,Distance measure,Record similarity,Bibliographic database
Data mining,Information retrieval,Bibliographic database,Computer science,Cluster validity index,Scopus,Digital library,Optimization problem,Genetic algorithm,Database
Journal
Volume
Issue
ISSN
47
1
0924-669X
Citations 
PageRank 
References 
0
0.34
39
Authors
3
Name
Order
Citations
PageRank
Sumit Mishra1125.94
Sriparna Saha21064106.07
Samrat Mondal310018.02