Scene Text Visual Question Answering - Citegraph

Paper Info

Title
Scene Text Visual Question Answering

Abstract
Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image. In this work, we present a new dataset, ST-VQA, that aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the Visual Question Answering process. We use this dataset to define a series of tasks of increasing difficulty for which reading the scene text in the context provided by the visual information is necessary to reason and generate an appropriate answer. We propose a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module. In addition we put forward a series of baseline methods, which provide further insight to the newly released dataset, and set the scene for further research.

Year	DOI	Venue
2019	10.1109/ICCV.2019.00439	2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019)
DocType	ISSN	Citations
Conference	1550-5499	4
PageRank	References	Authors
0.41	0	8

Authors (8 rows)

Cited by (4 rows)

References (0 rows)

Name	Order	Citations	PageRank
Ali Furkan Biten	1	9	2.18
Ruben Tito	2	10	2.18
Andrés Mafla	3	12	2.89
Lluís Gómez i Bigorda	4	6	2.48
Marçal Rusiñol	5	386	33.57
C.V Jawahar	6	61	10.95
Ernest Valveny	7	647	41.65
Dimosthenis Karatzas	8	406	38.13

1