Abstract | ||
---|---|---|
This paper describes an experiment on extracting Hungarian multi-word lexemes from a corpus, using statistical methods. Corpus preparation—the addition of POS tags and stems—was done automatically. From the corpus, hverb+noun+casemarki patterns were extracted as collocation candidates. Evaluation shows that the statistical methods used by Villada Moiron (2004a) to identify Dutch V + PP collocations, can also be applied to the Hungarian data. Some collocation types (such as verbal arguments) require special extrac- tion methods, as explained in the evaluation section. Finally, we suggest that the extraction process can be further improved by a blend of statistical techniques with rule-based and dictionary-based methods. |
Year | Venue | Keywords |
---|---|---|
2003 | CLIN | rule based |
Field | DocType | Citations |
Computer science,Artificial intelligence,Natural language processing,Linguistics | Conference | 0 |
PageRank | References | Authors |
0.34 | 3 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Balázs Kis | 1 | 18 | 3.73 |
begona villada moiron | 2 | 4 | 1.84 |
tamas biro | 3 | 4 | 2.20 |
Gosse Bouma | 4 | 483 | 70.88 |
gabor pohl | 5 | 3 | 1.15 |
gabor ugray | 6 | 3 | 0.81 |
John Nerbonne | 7 | 174 | 47.63 |
rijksuniversiteit groningen morphologic | 8 | 0 | 0.34 |