Title
Consolidation of References to Persons in Bibliographic Databases
Abstract
Entity resolution is the process of determining if, in a specific context, two or more references correspond to the same entity. In this work, we address this problem in the context of references to persons as they are found in bibliographic data, specifically in the case of consolidating multiple datasets. Or solution follows the extraction, transformation and loading (ETL) process, typical in data warehouses. It computes the similarities of the attribute values for the references, and employs a decision tree to decide when the references match. We describe the characteristics of these references within bibliographic datasets, and how we explored those characteristics by developing new similarity metrics to improve the quality of the consolidation process. We evaluated our work by designing an experiment with data from four national libraries. The results show that the proposed similarity metrics contribute significantly to the consolidation process.
Year
DOI
Venue
2008
10.1007/978-3-540-89533-6_26
ICADL
Keywords
Field
DocType
bibliographic datasets,entity resolution,data warehouse,attribute value,consolidation process,bibliographic data,new similarity metrics,bibliographic databases,specific context,multiple datasets,proposed similarity metrics,machine learning,decision tree
Data warehouse,Decision tree,Data mining,Name resolution,Information retrieval,Computer science,Consolidation (soil),Database
Conference
Volume
ISSN
Citations 
5362
0302-9743
3
PageRank 
References 
Authors
0.56
10
3
Name
Order
Citations
PageRank
Nuno Freire19722.27
José Luis Borbinha215120.02
Bruno Martins344134.58