What Works and Doesn't Work, A Deep Decoder for Neural Machine Translation - Citegraph

Paper Info

Title
What Works and Doesn't Work, A Deep Decoder for Neural Machine Translation

Abstract
Deep learning has demonstrated performance advantages in a wide range of natural language processing tasks, including neural machine translation (NMT). Transformer NMT models are typically strengthened by deeper encoder layers, but deepening their decoder layers usually results in failure. In this paper, we first identify the cause of the failure of the deep decoder in the Transformer model. Inspired by this discovery, we then propose approaches to improving it, with respect to model structure and model training, to make the deep decoder practical in NMT. Specifically, with respect to model structure, we propose a cross-attention drop mechanism to allow the decoder layers to perform their own different roles, to reduce the difficulty of deep-decoder learning. For model training, we propose a collapse reducing training approach to improve the stability and effectiveness of deep-decoder training. We experimentally evaluated our proposed Transformer NMT model structure modification and novel training methods on several popular machine translation benchmarks. The results showed that deepening the NMT model by increasing the number of decoder layers successfully prevented the deepened decoder from degrading to an unconditional language model. In contrast to prior work on deepening an NMT model on the encoder, our method can deepen the model on both the encoder and decoder at the same time, resulting in a deeper model and improved performance.

Year	DOI	Venue
2022	10.18653/v1/2022.findings-acl.39	FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022)
DocType	Volume	Citations
Conference	Findings of the Association for Computational Linguistics: ACL 2022	0
PageRank	References	Authors
0.34	0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Zuchao Li	1	35	12.61
Yiran Wang	2	0	0.34
Masao Utiyama	3	714	86.69
Eiichiro SUMITA	4	1466	190.87
Hai Zhao	5	960	113.64
Taro Watanabe	6	572	36.86

1