Title | ||
---|---|---|
Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation |
Abstract | ||
---|---|---|
Over the last few years two promising research directions in low-resource neural machine translation (NMT) have emerged. The first focuses on utilizing high-resource languages to improve the quality of low-resource languages via multilingual NMT. The second direction employs monolingual data with self-supervision to pre-train translation models, followed by fine-tuning on small amounts of supervised data. In this work, we join these two lines of research and demonstrate the efficacy of monolingual data with self-supervision in multilingual NMT. We offer three major results: (i) Using monolingual data significantly boosts the translation quality of low-resource languages in multilingual models. (ii) Self-supervision improves zero-shot translation quality in multilingual models. (iii) Leveraging monolingual data with self-supervision provides a viable path towards adding new languages to multilingual models, getting up to 33 BLEU on ro-en translation without any parallel data or back-translation. |
Year | Venue | DocType |
---|---|---|
2020 | ACL | Conference |
ISSN | Citations | PageRank |
ACL 2020 | 0 | 0.34 |
References | Authors | |
0 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Aditya Siddhant | 1 | 7 | 4.46 |
Ankur Bapna | 2 | 36 | 8.45 |
Yuan Cao | 3 | 548 | 35.60 |
Orhan Firat | 4 | 281 | 29.13 |
Xu Chen | 5 | 30 | 5.73 |
Sneha Kudugunta | 6 | 17 | 1.35 |
Naveen Arivazhagan | 7 | 24 | 3.98 |
Yonghui Wu | 8 | 1065 | 72.78 |