Two-Step Joint Attention Network for Visual Question Answering - Citegraph

Paper Info

Title
Two-Step Joint Attention Network for Visual Question Answering

Abstract
Visual Question Answering(VQA) system is a task that answers natural language questions automatically according to the content of a reference image. Common method for VQA is to extract image feature and question feature by using deep neural network, and then combine the two features with attention mechanism to predict answer. Most of the attention methods for VQA merely concern about where the local regions of image are relevant to answer and ignore the question words have different weights to answer. Hence, we propose two-step joint attention that use the combining representation of the image feature and question feature to guide visual attention and question attention. Two-step joint attention is able to focus the given image and question from coarse-drained parts to fine-grained parts gradually to predict answer. For purpose of extracting image feature precisely, we also propose a BiSRU and use RNN based on BiSRU to allow the adjacent local region vectors of the image to maintain information each other. We demonstrate and analyze the effectiveness on the VQA dataset, and use visualization to show the results intuitively.

Year	DOI	Venue
2017	10.1109/BIGCOM.2017.17	2017 3rd International Conference on Big Data Computing and Communications (BIGCOM)
Keywords	Field	DocType
VQA,two-step joint attention,BiSRU	Question answering,Joint attention,Interrogative word,Computer science,Visualization,Reference image,Visual attention,Natural language,Natural language processing,Artificial intelligence,Artificial neural network	Conference
ISBN	Citations	PageRank
978-1-5386-3350-2	0	0.34
References	Authors
9	5

Authors (5 rows)

Cited by (0 rows)

References (9 rows)

Name	Order	Citations	PageRank
Weiming Zhang	1	83	15.80
Chunhong Zhang	2	14	6.37
Pei Liu	3	4	4.47
Zhiqiang Zhan	4	8	2.12
Xiaofeng Qiu	5	0	1.69

1