Methods for the Extraction of Hungarian Multi-Word Lexemes - Citegraph

Paper Info

Title
Methods for the Extraction of Hungarian Multi-Word Lexemes

Abstract
This paper describes an experiment on extracting Hungarian multi-word lexemes from a corpus, using statistical methods. Corpus preparation—the addition of POS tags and stems—was done automatically. From the corpus, hverb+noun+casemarki patterns were extracted as collocation candidates. Evaluation shows that the statistical methods used by Villada Moiron (2004a) to identify Dutch V + PP collocations, can also be applied to the Hungarian data. Some collocation types (such as verbal arguments) require special extrac- tion methods, as explained in the evaluation section. Finally, we suggest that the extraction process can be further improved by a blend of statistical techniques with rule-based and dictionary-based methods.

Year	Venue	Keywords
2003	CLIN	rule based
Field	DocType	Citations
Computer science,Artificial intelligence,Natural language processing,Linguistics	Conference	0
PageRank	References	Authors
0.34	3	8

Authors (8 rows)

Cited by (0 rows)

References (3 rows)

Name	Order	Citations	PageRank
Balázs Kis	1	18	3.73
begona villada moiron	2	4	1.84
tamas biro	3	4	2.20
Gosse Bouma	4	483	70.88
gabor pohl	5	3	1.15
gabor ugray	6	3	0.81
John Nerbonne	7	174	47.63
rijksuniversiteit groningen morphologic	8	0	0.34

1