Title
Facing the Identification Problem in Language-Related Scientific Data Analysis.
Abstract
This paper describes the problems that must be addressed when studying large amounts of data over time which require entity normalization applied not to the usual genres of news or political speech, but to the genre of academic discourse about language resources, technologies and sciences. It reports on the normalization processes that had to be applied to produce data usable for computing statistics in three past studies on the LRE Map, the ISCA Archive and the LDC Bibliography. It shows the need for human expertise during normalization and the necessity to adapt the work to the study objectives. It investigates possible improvements for reducing the workload necessary to produce comparable results. Through this paper, we show the necessity to define and agree on international persistent and unique identifiers.
Year
Venue
Keywords
2014
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
Data identification,Normalization,ISLRN,Archive analysis,Bibliometrics,Scientometrics
Field
DocType
Citations 
Computer science,Artificial intelligence,Natural language processing,Parameter identification problem
Conference
3
PageRank 
References 
Authors
0.48
2
5
Name
Order
Citations
PageRank
Joseph Mariani170494.01
Christopher Cieri212342.44
Gil Francopoulo310419.90
Patrick Paroubek488566.97
Marine Delaborde530.48