Title
What Works and Doesn't Work, A Deep Decoder for Neural Machine Translation
Abstract
Deep learning has demonstrated performance advantages in a wide range of natural language processing tasks, including neural machine translation (NMT). Transformer NMT models are typically strengthened by deeper encoder layers, but deepening their decoder layers usually results in failure. In this paper, we first identify the cause of the failure of the deep decoder in the Transformer model. Inspired by this discovery, we then propose approaches to improving it, with respect to model structure and model training, to make the deep decoder practical in NMT. Specifically, with respect to model structure, we propose a cross-attention drop mechanism to allow the decoder layers to perform their own different roles, to reduce the difficulty of deep-decoder learning. For model training, we propose a collapse reducing training approach to improve the stability and effectiveness of deep-decoder training. We experimentally evaluated our proposed Transformer NMT model structure modification and novel training methods on several popular machine translation benchmarks. The results showed that deepening the NMT model by increasing the number of decoder layers successfully prevented the deepened decoder from degrading to an unconditional language model. In contrast to prior work on deepening an NMT model on the encoder, our method can deepen the model on both the encoder and decoder at the same time, resulting in a deeper model and improved performance.
Year
DOI
Venue
2022
10.18653/v1/2022.findings-acl.39
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022)
DocType
Volume
Citations 
Conference
Findings of the Association for Computational Linguistics: ACL 2022
0
PageRank 
References 
Authors
0.34
0
6
Name
Order
Citations
PageRank
Zuchao Li13512.61
Yiran Wang200.34
Masao Utiyama371486.69
Eiichiro SUMITA41466190.87
Hai Zhao5960113.64
Taro Watanabe657236.86