Abstract | ||
---|---|---|
Entity resolution is the process of determining if, in a specific context, two or more references correspond to the same entity. In this work, we address this problem in the context of references to persons as they are found in bibliographic data, specifically in the case of consolidating multiple datasets. Or solution follows the extraction, transformation and loading (ETL) process, typical in data warehouses. It computes the similarities of the attribute values for the references, and employs a decision tree to decide when the references match. We describe the characteristics of these references within bibliographic datasets, and how we explored those characteristics by developing new similarity metrics to improve the quality of the consolidation process. We evaluated our work by designing an experiment with data from four national libraries. The results show that the proposed similarity metrics contribute significantly to the consolidation process. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1007/978-3-540-89533-6_26 | ICADL |
Keywords | Field | DocType |
bibliographic datasets,entity resolution,data warehouse,attribute value,consolidation process,bibliographic data,new similarity metrics,bibliographic databases,specific context,multiple datasets,proposed similarity metrics,machine learning,decision tree | Data warehouse,Decision tree,Data mining,Name resolution,Information retrieval,Computer science,Consolidation (soil),Database | Conference |
Volume | ISSN | Citations |
5362 | 0302-9743 | 3 |
PageRank | References | Authors |
0.56 | 10 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Nuno Freire | 1 | 97 | 22.27 |
José Luis Borbinha | 2 | 151 | 20.02 |
Bruno Martins | 3 | 441 | 34.58 |