Cross Aggregation of Multi-head Attention for Neural Machine Translation. - Citegraph

Paper Info

Title
Cross Aggregation of Multi-head Attention for Neural Machine Translation.

Abstract
Transformer based encoder has been the state-of-the-art model for the latest neural machine translation, which relies on the key design called self-attention. Multi-head attention of self-attention network (SAN) plays a significant role in extracting information of the given input from different subspaces among each pair of tokens. However, that information captured by each token on a specific head, which is explicitly represented by the attention weights, is independent from other heads and tokens, which means it does not take the global structure into account. Besides, since SAN does not apply an RNN-like network structure, its ability of modeling relative position and sequential information is weakened. In this paper, we propose a method named Cross Aggregation with an iterative routing-by-agreement algorithm to alleviate these problems. Experimental results on the machine translation task show that our method help the model outperform the strong Transformer baseline significantly.

Year	DOI	Venue
2019	10.1007/978-3-030-32233-5_30	Lecture Notes in Artificial Intelligence
Keywords	DocType	Volume
Machine translation,Attention mechanism,Information aggregation	Conference	11838
ISSN	Citations	PageRank
0302-9743	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Juncheng Cao	1	0	0.34
Hai Zhao	2	960	113.64
Kai Yu	3	25	4.47

1