Title
BTDP: Toward Sparse Fusion with Block Term Decomposition Pooling for Visual Question Answering
Abstract
AbstractBilinear models are very powerful in multimodal fusion tasks like Visual Question Answering. The predominant bilinear methods can all be seen as a kind of tensor-based decomposition operation that contains a key kernel called “core tensor.” Current approaches usually focus on reducing the computation complexity by applying low-rank constraint on the core tensor. In this article, we propose a novel bilinear architecture called Block Term Decomposition Pooling (BTDP), which not only maintains the advantages of previous bilinear methods but also conducts sparse bilinear interactions between modalities. Our method is based on Block Term Decompositions theory of tensor, which will result in a sparse and learnable block-diagonal core tensor for multimodal fusion. We prove that using such a block-diagonal core tensor is equivalent to conducting many “tiny” bilinear operations in different feature spaces. Thus, introducing sparsity into the bilinear operation can significantly increase the performance of feature fusion and improve VQA models. What is more, our BTDP is very flexible in design. We develop several variants of BTDP and discuss the effects of the diagonal blocks of core tensor. Extensive experiments on two challenging VQA-v1 and VQA-v2 datasets show that our BTDP method outperforms current bilinear models, achieving state-of-the-art performance.
Year
DOI
Venue
2019
10.1145/3282469
ACM Transactions on Multimedia Computing, Communications, and Applications
Keywords
DocType
Volume
Sparse bilinear pooling,block term decomposition pooling,visual question answering
Journal
15
Issue
ISSN
Citations 
2s
1551-6857
1
PageRank 
References 
Authors
0.35
0
6
Name
Order
Citations
PageRank
Zhiwei Fang11418.01
Jing Liu2178188.09
xueliang liu310212.33
Qu Tang421.04
Yong Li525428.66
Hanqing Lu64620291.38