Title
Automatic Generation of Dictionaries - The Journalistic Lexicon Case.
Abstract
Text normalisation is an important task in the context of Natural Language Processing. By normalisation, free text is mapped into dictionaries, i.e. indexed collections of locutions recognised as typical of a particular jaergon. In general, technical dictionaries are difficult to build and validate. They are typically constructed by hand on the basis of everyday human work and they are agreement-based. This is indubitably time consuming and the approach requires a strong human supervision and does not provide a general methodology. In this paper, we perform the first steps towards the to automatic building of a dictionary for Italian journalistic lexicon, called NewsDict, based on sub dictionaries able to characterise main topics occurring in newspaper articles. We exploit a dataset of annotated documents from some Italian newspapers and a statistical techniques based on the Mutual Information Principle. Documents contains information such as the release date and the topic of the article and has been directly annotated by the author. To check the accuracy of the dictionary we built, we develop an initial test. We normalise a control set of journal article into NewsDict. Crossing results presented in this paper against the human annotation, we provide a fist measure of performances of the described methodology.
Year
DOI
Venue
2019
10.1007/978-3-030-22999-3_63
ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: FROM THEORY TO PRACTICE
Keywords
Field
DocType
Text normalization,Statistical natural language processing,Automatic generation of dictionaries
Annotation,Release date,Computer science,Newspaper,Exploit,Lexicon,Artificial intelligence,Mutual information,Natural language processing,Fist,Text normalization
Conference
Volume
ISSN
Citations 
11606
0302-9743
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Matteo Cristani125934.75
Claudio Tomazzoli22511.36
Margherita Zorzi38116.16