Title
Embodied Question Answering
Abstract
We present a new AI task - Embodied Question Answering(EmbodiedQA) - where an agent is spawned at a random location in a 3D environment and asked a question ('What color is the car?'). In order to answer, the agent must first intelligently navigate to explore the environment, gather necessary visual information through first-person (egocentric) vision, and then answer the question ('orange'). EmbodiedQA requires a range of AI skills - language understanding, visual recognition, active perception, goal-driven navigation, commonsense reasoning, long-term memory, and grounding language into actions. In this work, we develop a dataset of questions and answers in House3D environments [1], evaluation metrics, and a hierarchical model trained with imitation and reinforcement learning.
Year
DOI
Venue
2018
10.1109/CVPR.2018.00008
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Keywords
DocType
ISSN
random location,first-person vision,visual recognition,goal-driven navigation,embodied question answering,house3D environments,commonsense reasoning,long-term memory,reinforcement learning,evaluation metrics,hierarchical model training,AI skills
Conference
1063-6919
ISBN
Citations 
PageRank 
978-1-5386-6421-6
2
0.40
References 
Authors
5
6
Name
Order
Citations
PageRank
Abhishek Das143323.54
Samyak Datta2122.59
Georgia Gkioxari342031.64
Stefan Lee423119.88
Devi Parikh52929132.01
Dhruv Batra62142104.81