Title | ||
---|---|---|
Redistributing Low-Frequency Words: Making the Most of Monolingual Data in Non-Autoregressive Translation. |
Abstract | ||
---|---|---|
Knowledge distillation (KD) is the preliminary step for training non-autoregressive translation (NAT) models, which eases the training of NAT models at the cost of losing important information for translating low-frequency words. In this work, we provide an appealing alternative for NAT – |
Year | DOI | Venue |
---|---|---|
2022 | 10.18653/v1/2022.acl-long.172 | Annual Meeting of the Association for Computational Linguistics |
DocType | Volume | Citations |
Conference | Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) | 0 |
PageRank | References | Authors |
0.34 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ding Liang | 1 | 161 | 17.45 |
Longyue Wang | 2 | 72 | 18.24 |
Shuming Shi | 3 | 620 | 58.27 |
Dacheng Tao | 4 | 19032 | 747.78 |
Zhaopeng Tu | 5 | 518 | 39.95 |