Title
Methods for the Extraction of Hungarian Multi-Word Lexemes
Abstract
This paper describes an experiment on extracting Hungarian multi-word lexemes from a corpus, using statistical methods. Corpus preparation—the addition of POS tags and stems—was done automatically. From the corpus, hverb+noun+casemarki patterns were extracted as collocation candidates. Evaluation shows that the statistical methods used by Villada Moiron (2004a) to identify Dutch V + PP collocations, can also be applied to the Hungarian data. Some collocation types (such as verbal arguments) require special extrac- tion methods, as explained in the evaluation section. Finally, we suggest that the extraction process can be further improved by a blend of statistical techniques with rule-based and dictionary-based methods.
Year
Venue
Keywords
2003
CLIN
rule based
Field
DocType
Citations 
Computer science,Artificial intelligence,Natural language processing,Linguistics
Conference
0
PageRank 
References 
Authors
0.34
3
8