Title
Self-Adaptive Neural Module Transformer for Visual Question Answering
Abstract
Vision and language understanding is one of the most fundamental and difficult tasks in Multimedia Intelligence. Simultaneously Visual Question Answering (VQA) is even more challenging since it requires complex <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">reasoning</i> steps to the correct answer. To achieve this, Neural Module Network (NMN) and its variants rely on parsing the natural language question into a module layout (i.e., a problem-solving program). In particular, this process follows a feedforward encoder-decoder pipeline: the encoder embeds the question into a static vector and the decoder generates the layout. However, we argue that such conventional encoder-decoder neglects the dynamic nature of question comprehension (i.e., we should attend to different words from step to step) and per-module intermediate results (i.e., we should discard module performing badly) in the reasoning steps. In this paper, we present a novel NMN, called Self-Adaptive Neural Module Transformer (SANMT), which adaptively adjusts both of the question feature encoding and the layout decoding by considering intermediate Q&A results. Specifically, we encode the intermediate results with the given question features by a novel transformer module to generate dynamic question feature embedding which evolves over reasoning steps. Besides, the transformer utilizes the intermediate results from each reasoning step to guide subsequent layout arrangement. Extensive experimental evaluations demonstrate the superiority of the proposed SANMT over NMN and its variants on four challenging benchmarks, including CLEVR, CLEVR-CoGenT, VQAv1.0, and VQAv2.0 (on average the relative improvement over NMN are 1.5, 2.3, 0.7 and 0.5 points with respect to accuracy).
Year
DOI
Venue
2021
10.1109/TMM.2020.2995278
IEEE Transactions on Multimedia
Keywords
DocType
Volume
Visual question answering,neural module transformer,multi modal,self-adaptive
Journal
23
ISSN
Citations 
PageRank 
1520-9210
3
0.37
References 
Authors
0
6
Name
Order
Citations
PageRank
Zhong Huasong130.37
Jingyuan Chen22287.50
chen shen310317.21
Hanwang Zhang4196578.34
Jianqiang Huang55519.18
Xian-Sheng Hua66566328.17