Title
Multi-modal neural machine translation with deep semantic interactions
Abstract
Based on the conventional attentional encoder-decoder framework, multi-modal neural machine translation (NMT) further incorporates spatial visual features through a separate visual attention mechanism. In this aspect, most current multi-modal NMT models first separately learn the semantic representations of text and image and then independently produce two modalities of context vectors for word predictions, neglecting their semantic interactions. In this paper, we argue that learning text-image semantic interactions is more reasonable in the sense of jointly modeling two modalities for multi-modal NMT and propose a novel multi-modal NMT model with deep semantic interactions. Specifically, our model extends the conventional multi-modal NMT by introducing the following two attention neural networks: (1) a bi-directional attention network for modeling text and image representations, where the semantic representations of text are learned by referring to the image representations, and vice versa; (2) a co-attention network for refining text and image context vectors, which first summarizes the text into a context vector, then attends it to the image for obtaining the text-aware visual context vector. The final context vector is calculated by re-attending the visual context vector to the text. Results on the Multi30k dataset for different language pairs show that our model significantly improves on the state-of-the-art baselines. We have released our code at https://github.com/DeepLearnXMU/MNMT.
Year
DOI
Venue
2021
10.1016/j.ins.2020.11.024
Information Sciences
Keywords
DocType
Volume
Multi-modal neural machine translation,Semantic interaction,Bi-directional attention,Co-attention
Journal
554
ISSN
Citations 
PageRank 
0020-0255
2
0.46
References 
Authors
0
8
Name
Order
Citations
PageRank
Jinsong Su126041.51
Jinchang Chen220.46
Hui Jiang384.32
Chulun Zhou432.17
Huan Lin5488.01
Ge Yubin631.84
Qingqiang Wu7475.49
yongxuan lai811220.24