Abstract | ||
---|---|---|
We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being sufficiently grounded in vision to allow objective evaluation of individual responses and benchmark progress. We develop a novel two-person real-time chat data-collection protocol to curate a large-scale Visual Dialog dataset (VisDial). VisDial v0.9 has been released and consists of dialog question-answer pairs from 10-round, human-human dialogs grounded in images from the COCO dataset. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1109/TPAMI.2018.2828437 | CVPR |
Keywords | DocType | Volume |
Visualization,Task analysis,Artificial intelligence,History,Protocols,Natural languages,Wheelchairs | Conference | abs/1611.08669 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Abhishek Das | 1 | 433 | 23.54 |
satwik kottur | 2 | 19 | 4.13 |
Khushi Gupta | 3 | 0 | 0.34 |
avi singh | 4 | 8 | 2.86 |
Deshraj Yadav | 5 | 0 | 0.34 |
José M. F. Moura | 6 | 5137 | 426.14 |
Devi Parikh | 7 | 2929 | 132.01 |
Dhruv Batra | 8 | 2142 | 104.81 |