Title
Influence of accurate compound noun splitting on bilingual vocabulary extraction.
Abstract
The influence of compound noun splitting on a German-Polish bilingual vocabulary extraction task is investigated. To accomplish this, several unsupervised methods for increasingly accurate compound noun splitting are introduced. Bilingual evidence from a parallel German-Polish corpus and co-occurrence counts from the web are used to disambiguate compound noun analyses directly. These collected splits serve as training data for a probabilistic model that abstracts away from the errors made by the direct methods and reaches an f-measure of 95.10%. Furthermore, these methods are evaluated in terms of word alignment quality and extraction accuracy where linguistically accurate methods are found to outperform the corpus-based methods proposed in the literature. A comparison of alignment quality achieved with the best splitting method and the baseline implies that the effort to build supervised splitting methods might result in minimal or no performance gains.
Year
DOI
Venue
2008
10.1515/9783110211818.2.91
Text Translation Computational Processing
DocType
Volume
ISSN
Conference
8
1861-4272
Citations 
PageRank 
References 
8
0.87
7
Authors
1
Name
Order
Citations
PageRank
Marcin Junczys-Dowmunt131224.24