Title
The automatic normalisation challenge: detailed addresses identification.
Abstract
The correct attribution of scientific publications to their true owners is extremely important, considering the detailed evaluation processes and the future investments based upon them. This attribution is a hard job for bibliometricians because of the increasing amount of documents and the raise of collaboration. Nevertheless, there is no published work with a comprehensive solution of the problem. This article introduces a procedure for the detailed identification and normalisation of addresses to facilitate the correct allocation of the scientific production included in databases. Thanks to our long experience in the manual normalisation of addresses, we have created and maintained various master lists. We have already developed an application to detect institutional sectors (issued in a previous paper) and now we analyse the details of particular institutions, taking advantage of our master tables. To test our methodology we have implemented it in a Spanish data set already manually codified (95,314 unique addresses included in the year 2008 on the Web of Science databases). This data was analysed with a full text search against our master lists, giving optional codes for each address and choosing which one could be automatically encoded and which one should be reviewed manually. The results of the implementation, comparing the automatic versus manual codes, showed 87 % automatically codified records with 1.9 % of error. We should review manually only 13 %. Finally, we applied the Wilcoxon non-parametric test to show the validity of the methodology, comparing detailed codes of centres already encoded with the automatically encoded ones, and concluding that their distribution was similar with a significance of 0.078.
Year
DOI
Venue
2013
https://doi.org/10.1007/s11192-013-0965-0
Scientometrics
Keywords
Field
DocType
Addresses normalisation,Automatic procedures,Bibliometric indicators,Web of Science databases
Data science,Data mining,Information retrieval,Scientific production,Computer science,Full text search,Wilcoxon signed-rank test,Attribution
Journal
Volume
Issue
ISSN
95
3
0138-9130
Citations 
PageRank 
References 
7
0.63
16
Authors
3
Name
Order
Citations
PageRank
Fernanda Morillo117816.26
Ignacio Santabárbara270.63
Javier Aparicio3413.31