Title
Unsupervised Lexical Acquisition for Part of Speech Tagging
Abstract
It is known that POS tagging is not very accurate for unknown words (words which the POS tagger has not seen in the training corpora). Thus, a first step to improve the tagging accuracy would be to extend the coverage of the tagger's learned lexicon. It turns out that, through the use of a simple procedure, one can extend this lexicon without using additional, hard to obtain, hand-validated training corpora. The basic idea consists of merely adding new words along with their (correct) POS tags to the lexicon and trying to estimate the lexical distribution of these words according to similar ambiguity classes already present in the lexicon. We present a method of automatically acquire high quality POS tagging lexicons based on morphologic analysis and generation. Currently, this procedure works on Romanian for which we have a required paradigmatic generation procedure but the architecture remains general in the sense that given the appropriate substitutes for the morphological generator and POS tagger, one should obtain similar results.
Year
Venue
Keywords
2008
SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008
morphological analysis
Field
DocType
Citations 
Computer science,Romanian,Part-of-speech tagging,Speech recognition,Lexicon,Artificial intelligence,Natural language processing,Lexical acquisition,Ambiguity
Conference
2
PageRank 
References 
Authors
0.42
4
4
Name
Order
Citations
PageRank
Dan Tufis148558.39
Elena Irimia2246.76
Radu Ion316322.33
Alexandru Ceauşu4709.36