Title | ||
---|---|---|
Don’t Forget the Long Tail! A Comprehensive Analysis of Morphological Generalization in Bilingual Lexicon Induction |
Abstract | ||
---|---|---|
Human translators routinely have to translate rare inflections of words - due to the Zipfian distribution of words in a language. When translating from Spanish, a good translator would have no problem identifying the proper translation of a statistically rare inflection such as habl\'aramos. Note the lexeme itself, hablar, is relatively common. In this work, we investigate whether state-of-the-art bilingual lexicon inducers are capable of learning this kind of generalization. We introduce 40 morphologically complete dictionaries in 10 languages and evaluate three of the state-of-the-art models on the task of translation of less frequent morphological forms. We demonstrate that the performance of state-of-the-art models drops considerably when evaluated on infrequent morphological inflections and then show that adding a simple morphological constraint at training time improves the performance, proving that the bilingual lexicon inducers can benefit from better encoding of morphology. |
Year | DOI | Venue |
---|---|---|
2019 | 10.18653/v1/D19-1090 | EMNLP/IJCNLP (1) |
DocType | Volume | Citations |
Conference | D19-1 | 0 |
PageRank | References | Authors |
0.34 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Paula Czarnowska | 1 | 0 | 1.69 |
Sebastian Ruder | 2 | 424 | 28.13 |
Grave, Edouard | 3 | 860 | 33.43 |
Ryan Cotterell | 4 | 0 | 12.51 |
Ann Copestake | 5 | 862 | 95.10 |