VQA: Visual Question Answering - Citegraph

Paper Info

Title
VQA: Visual Question Answering

Abstract
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing ~0.25M images, ~0.76M questions, and ~10M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines for VQA are provided and compared with human performance.

Year	DOI	Venue
2015	10.1007/s11263-016-0966-6	International Journal of Computer Vision
Keywords	Field	DocType
Visual Question Answering	Question answering,Information retrieval,Computer science,Closed set,Natural language,Artificial intelligence,Natural language processing,Mirroring	Journal
Volume	Issue	ISSN
123	1	0920-5691
Citations	PageRank	References
320	7.87	44
Authors
7

Search Limit

100320

Authors (7 rows)

Cited by (100 rows)

References (44 rows)

Name	Order	Citations	PageRank
Stanislaw Antol	1	356	10.61
Aishwarya Agrawal	2	360	10.62
Jiasen Lu	3	544	16.43
Margaret Mitchell	4	1450	65.37
Dhruv Batra	5	2142	104.81
C. Lawrence Zitnick	6	7321	332.72
Devi Parikh	7	2929	132.01

1