Abstract | ||
---|---|---|
Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance. In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT. Since the LM objective learns to reconstruct a few source tokens and copy most of them, the pre-training initialization would affect the copying behaviors of NMT models. We provide a quantitative analysis of copying behaviors by introducing a metric called copying ratio, which empirically shows that pre-training based NMT models have a larger copying ratio than the standard one. In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding. Extensive experiments on both in-domain and out-of-domain benchmarks show that the copying penalty method consistently improves translation performance by controlling copying behaviors for pre-training based NMT models. Source code is freely available at https://github.com/SunbowLiu/CopyingPenalty. |
Year | Venue | DocType |
---|---|---|
2021 | ACL/IJCNLP | Conference |
Volume | Citations | PageRank |
2021.findings-acl | 0 | 0.34 |
References | Authors | |
0 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xuebo Liu | 1 | 37 | 4.91 |
Longyue Wang | 2 | 72 | 18.24 |
Derek F. Wong | 3 | 82 | 19.81 |
Ding Liang | 4 | 161 | 17.45 |
Lidia S. Chao | 5 | 113 | 22.42 |
Shuming Shi | 6 | 620 | 58.27 |
Zhaopeng Tu | 7 | 518 | 39.95 |