Title
Cross Aggregation of Multi-head Attention for Neural Machine Translation.
Abstract
Transformer based encoder has been the state-of-the-art model for the latest neural machine translation, which relies on the key design called self-attention. Multi-head attention of self-attention network (SAN) plays a significant role in extracting information of the given input from different subspaces among each pair of tokens. However, that information captured by each token on a specific head, which is explicitly represented by the attention weights, is independent from other heads and tokens, which means it does not take the global structure into account. Besides, since SAN does not apply an RNN-like network structure, its ability of modeling relative position and sequential information is weakened. In this paper, we propose a method named Cross Aggregation with an iterative routing-by-agreement algorithm to alleviate these problems. Experimental results on the machine translation task show that our method help the model outperform the strong Transformer baseline significantly.
Year
DOI
Venue
2019
10.1007/978-3-030-32233-5_30
Lecture Notes in Artificial Intelligence
Keywords
DocType
Volume
Machine translation,Attention mechanism,Information aggregation
Conference
11838
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Juncheng Cao100.34
Hai Zhao2960113.64
Kai Yu3254.47