Title
Redistributing Low-Frequency Words: Making the Most of Monolingual Data in Non-Autoregressive Translation.
Abstract
Knowledge distillation (KD) is the preliminary step for training non-autoregressive translation (NAT) models, which eases the training of NAT models at the cost of losing important information for translating low-frequency words. In this work, we provide an appealing alternative for NAT –
Year
DOI
Venue
2022
10.18653/v1/2022.acl-long.172
Annual Meeting of the Association for Computational Linguistics
DocType
Volume
Citations 
Conference
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Ding Liang116117.45
Longyue Wang27218.24
Shuming Shi362058.27
Dacheng Tao419032747.78
Zhaopeng Tu551839.95