Title
From Recognition To Cognition: Visual Commonsense Reasoning
Abstract
Visual understanding goes well beyond object recognition. With one glance at an image, we can effortlessly imagine the world beyond the pixels: for instance, we can infer people's actions, goals, and mental states. While this task is easy for humans, it is tremendously difficult for today's vision systems, requiring higher-order cognition and commonsense reasoning about the world. We formalize this task as Visual Commonsense Reasoning. Given a challenging question about an image, a machine must answer correctly and then provide a rationale justifying its answer.Next, we introduce a new dataset, VCR, consisting of 290k multiple choice QA problems derived from 110k movie scenes. The key recipe for generating non-trivial and high-quality problems at scale is Adversarial Matching, a new approach to transform rich annotations into multiple choice questions with minimal bias. Experimental results show that while humans find VCR easy (over 90% accuracy), state-of-the-art vision models struggle (similar to 45%).To move towards cognition-level understanding, we present a new reasoning engine, Recognition to Cognition Networks (R2C), that models the necessary layered inferences for grounding, contextualization, and reasoning. R2C helps narrow the gap between humans and machines (similar to 65%); still, the challenge is far from solved, and we provide analysis that suggests avenues for future work.
Year
DOI
Venue
2018
10.1109/CVPR.2019.00688
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019)
Field
DocType
Volume
Semantic reasoner,Computer science,Commonsense reasoning,Human–computer interaction,Pixel,Artificial intelligence,Cognition,Contextualization,Machine learning,Cognitive neuroscience of visual object recognition,Adversarial system,Multiple choice
Journal
abs/1811.10830
ISSN
Citations 
PageRank 
1063-6919
23
0.71
References 
Authors
48
4
Name
Order
Citations
PageRank
Rowan G. Zellers11107.55
Yonatan Bisk219617.54
Ali Farhadi34492190.40
Yejin Choi42239153.18