On the Copying Behaviors of Pre-Training for Neural Machine Translation. - Citegraph

Paper Info

Title
On the Copying Behaviors of Pre-Training for Neural Machine Translation.

Abstract
Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance. In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT. Since the LM objective learns to reconstruct a few source tokens and copy most of them, the pre-training initialization would affect the copying behaviors of NMT models. We provide a quantitative analysis of copying behaviors by introducing a metric called copying ratio, which empirically shows that pre-training based NMT models have a larger copying ratio than the standard one. In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding. Extensive experiments on both in-domain and out-of-domain benchmarks show that the copying penalty method consistently improves translation performance by controlling copying behaviors for pre-training based NMT models. Source code is freely available at https://github.com/SunbowLiu/CopyingPenalty.

Year	Venue	DocType
2021	ACL/IJCNLP	Conference
Volume	Citations	PageRank
2021.findings-acl	0	0.34
References	Authors
0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Xuebo Liu	1	37	4.91
Longyue Wang	2	72	18.24
Derek F. Wong	3	82	19.81
Ding Liang	4	161	17.45
Lidia S. Chao	5	113	22.42
Shuming Shi	6	620	58.27
Zhaopeng Tu	7	518	39.95

1