CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog. - Citegraph

Paper Info

Title
CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog.

Abstract
Visual Dialog is a multimodal task of answering a sequence of questions grounded in an image, using the conversation history as context. It entails challenges in vision, language, reasoning, and grounding. However, studying these subtasks in isolation on large, real datasets is infeasible as it requires prohibitively-expensive complete annotation of the u0027stateu0027 of all images and dialogs. develop CLEVR-Dialog, a large diagnostic dataset for studying multi-round reasoning in visual dialog. Specifically, we construct a dialog grammar that is grounded in the scene graphs of the images from the CLEVR dataset. This combination results in a dataset where all aspects of the visual dialog are fully annotated. In total, CLEVR-Dialog contains 5 instances of 10-round dialogs for about 85k CLEVR images, totaling to 4.25M question-answer pairs. use CLEVR-Dialog to benchmark performance of standard visual dialog models; in particular, on visual coreference resolution (as a function of the coreference distance). This is the first analysis of its kind for visual dialog models that was not possible without this dataset. We hope the findings from CLEVR-Dialog will help inform the development of future models for visual dialog. Our dataset and code will be made public.

Year	Venue	Field
2019	arXiv: Computer Vision and Pattern Recognition	Dialog box,Computer science,Artificial intelligence,Natural language processing
DocType	Volume	Citations
Journal	abs/1903.03166	1
PageRank	References	Authors
0.35	13	5

Authors (5 rows)

Cited by (1 rows)

References (13 rows)

Name	Order	Citations	PageRank
satwik kottur	1	19	4.13
José M. F. Moura	2	1	1.03
Devi Parikh	3	2929	132.01
Dhruv Batra	4	2142	104.81
Marcus Rohrbach	5	3138	107.83

1