Abstract | ||
---|---|---|
We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query. Our method maps textual queries and visual features from various regions into a shared space where they are compared for relevance with an inner product. Our method exhibits significant improvements in answering questions such as "what color," where it is necessary to evaluate a specific location, and "what room," where it selectively identifies informative image regions. Our model is tested on the recently released VQA [1] dataset, which features free-form human-annotated questions and answers. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1109/CVPR.2016.499 | 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) |
Field | DocType | Volume |
Question answering,Information retrieval,Pattern recognition,Computer science,Artificial intelligence | Journal | abs/1511.07394 |
Issue | ISSN | Citations |
1 | 1063-6919 | 87 |
PageRank | References | Authors |
2.28 | 19 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kevin J. Shih | 1 | 183 | 8.77 |
Saurabh Singh | 2 | 860 | 33.24 |
Derek Hoiem | 3 | 4998 | 302.66 |