Title
Dynamic interaction networks for image-text multimodal learning
Abstract
Recently, there is a surge of interest in image-text multimodal representation learning, and many neural network based models have been proposed aiming to capture the interaction between two modalities with different forms of functions. Despite their success, a potential limitation of these methods is insufficient to model all kinds of interactions with a set of static parameters. To alleviate this problem, we present a dynamic interaction network, in which the parameters of the interaction function are dynamically generated by a meta network. Additionally, to provide necessary multimodal features that the meta network needs, we propose a new neural module called Multimodal Transformer. Experimentally, we not only make a comprehensively quantitative evaluation on four image-text tasks, but also show some interpretable analyses of our models, revealing the internal working mechanism of the dynamic parameter learning.
Year
DOI
Venue
2020
10.1016/j.neucom.2019.10.103
Neurocomputing
Keywords
DocType
Volume
Multimodal learning,Dynamic parameters prediction,Deep neural networks
Journal
379
ISSN
Citations 
PageRank 
0925-2312
1
0.36
References 
Authors
0
4
Name
Order
Citations
PageRank
Wenshan Wang1249.00
Pengfei Liu2587.83
Su Yang311014.58
Weishan Zhang439652.57