Multimodal Fusion with Co-attention Mechanism - Citegraph

Paper Info

Title
Multimodal Fusion with Co-attention Mechanism

Abstract
Because the information from different modalities will complement each other when describing the same contents, multimodal information can be used to obtain better feature representations. Thus, how to represent and fuse the relevant information has become a current research topic. At present, most of the existing feature fusion methods consider the different levels of features representations, but they ignore the significant relevance between the local regions, especially in the high-level semantic representation. In this paper, a general multimodal fusion method based on the co-attention mechanism is proposed, which is similar to the transformer structure. We discuss two main issues: (1) Improving the applicability and generality of the transformer to different modal data; (2) By capturing and transmitting the relevant information between local features before fusion, the proposed method can allow for more robustness. We evaluate our model on the multimodal classification task, and the experiments demonstrate that our model can learn fused featnre representation effectively.

Year	DOI	Venue
2020	10.23919/FUSION45008.2020.9190483	2020 IEEE 23rd International Conference on Information Fusion (FUSION)
Keywords	DocType	ISBN
Multimodal feature fusion,Co-attention mechanism,Transformer,Deep neural network	Conference	978-1-7281-6830-2
Citations	PageRank	References
0	0.34	4
Authors
2

Authors (2 rows)

Cited by (0 rows)

References (4 rows)

Name	Order	Citations	PageRank
Pei Li	1	0	0.34
Xinde Li	2	50	11.00

1