Title
Multimodal Fusion with Co-attention Mechanism
Abstract
Because the information from different modalities will complement each other when describing the same contents, multimodal information can be used to obtain better feature representations. Thus, how to represent and fuse the relevant information has become a current research topic. At present, most of the existing feature fusion methods consider the different levels of features representations, but they ignore the significant relevance between the local regions, especially in the high-level semantic representation. In this paper, a general multimodal fusion method based on the co-attention mechanism is proposed, which is similar to the transformer structure. We discuss two main issues: (1) Improving the applicability and generality of the transformer to different modal data; (2) By capturing and transmitting the relevant information between local features before fusion, the proposed method can allow for more robustness. We evaluate our model on the multimodal classification task, and the experiments demonstrate that our model can learn fused featnre representation effectively.
Year
DOI
Venue
2020
10.23919/FUSION45008.2020.9190483
2020 IEEE 23rd International Conference on Information Fusion (FUSION)
Keywords
DocType
ISBN
Multimodal feature fusion,Co-attention mechanism,Transformer,Deep neural network
Conference
978-1-7281-6830-2
Citations 
PageRank 
References 
0
0.34
4
Authors
2
Name
Order
Citations
PageRank
Pei Li100.34
Xinde Li25011.00