Title
Don’t Forget the Long Tail! A Comprehensive Analysis of Morphological Generalization in Bilingual Lexicon Induction
Abstract
Human translators routinely have to translate rare inflections of words - due to the Zipfian distribution of words in a language. When translating from Spanish, a good translator would have no problem identifying the proper translation of a statistically rare inflection such as habl\'aramos. Note the lexeme itself, hablar, is relatively common. In this work, we investigate whether state-of-the-art bilingual lexicon inducers are capable of learning this kind of generalization. We introduce 40 morphologically complete dictionaries in 10 languages and evaluate three of the state-of-the-art models on the task of translation of less frequent morphological forms. We demonstrate that the performance of state-of-the-art models drops considerably when evaluated on infrequent morphological inflections and then show that adding a simple morphological constraint at training time improves the performance, proving that the bilingual lexicon inducers can benefit from better encoding of morphology.
Year
DOI
Venue
2019
10.18653/v1/D19-1090
EMNLP/IJCNLP (1)
DocType
Volume
Citations 
Conference
D19-1
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Paula Czarnowska101.69
Sebastian Ruder242428.13
Grave, Edouard386033.43
Ryan Cotterell4012.51
Ann Copestake586295.10