Abstract | ||
---|---|---|
The influence of compound noun splitting on a German-Polish bilingual vocabulary extraction task is investigated. To accomplish this, several unsupervised methods for increasingly accurate compound noun splitting are introduced. Bilingual evidence from a parallel German-Polish corpus and co-occurrence counts from the web are used to disambiguate compound noun analyses directly. These collected splits serve as training data for a probabilistic model that abstracts away from the errors made by the direct methods and reaches an f-measure of 95.10%. Furthermore, these methods are evaluated in terms of word alignment quality and extraction accuracy where linguistically accurate methods are found to outperform the corpus-based methods proposed in the literature. A comparison of alignment quality achieved with the best splitting method and the baseline implies that the effort to build supervised splitting methods might result in minimal or no performance gains. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1515/9783110211818.2.91 | Text Translation Computational Processing |
DocType | Volume | ISSN |
Conference | 8 | 1861-4272 |
Citations | PageRank | References |
8 | 0.87 | 7 |
Authors | ||
1 |
Name | Order | Citations | PageRank |
---|---|---|---|
Marcin Junczys-Dowmunt | 1 | 312 | 24.24 |