Title
Automatic semantic tagging of unknown proper names
Abstract
Implemented methods for proper names recognition rely on large gazetteers of common proper nouns and a set of heuristic rules (e.g. Mr. as an indicator of a PERSON entity type). Though the performance of current PN recognizers is very high (over 90%), it is important to note that this problem is by no means a "solved problem". Existing systems perform extremely well on newswire corpora by virtue of the availability of large gazetteers and rule bases designed for specific tasks (e.g. recognition of Organization and Person entity types as specified in recent Message Understanding Conferences MUC).However, large gazetteers are not available for most languages and applications other than newswire texts and, in any case, proper nouns are an open class.In this paper we describe a context-based method to assign an entity type to unknown proper names (PNs). Like many others, our system relies on a gazetteer and a set of context-dependent heuristics to classify proper nouns. However, due to the unavailability of large gazetteers in Italian, over 20% detected PNs cannot be semantically tagged.The algorithm that we propose assigns an entity type to an unknown PN based on the analysis of syntactically and semantically similar contexts already seen in the application corpus.The performance of the algorithm is evaluated not only in terms of precision, following the tradition of MUC conferences, but also in terms of Information Gain, an information theoretic measure that takes into account the complexity of the classification task.
Year
DOI
Venue
1998
10.3115/980451.980892
COLING-ACL
Keywords
Field
DocType
large gazetteer,automatic semantic,current pn recognizers,entity type,muc conference,unknown proper name,proper noun,common proper noun,newswire corpus,proper names recognition,person entity type,information gain,noun,proper names
Heuristic,Computer science,Information gain,Heuristics,Unavailability,Artificial intelligence,Natural language processing,Proper noun,Machine learning
Conference
Volume
Citations 
PageRank 
P98-1
9
4.07
References 
Authors
10
3
Name
Order
Citations
PageRank
Alessandro Cucchiarelli122636.38
Danilo Luzi2175.47
paola velardi31553163.66