Abstract | ||
---|---|---|
We present a new AI task - Embodied Question Answering(EmbodiedQA) - where an agent is spawned at a random location in a 3D environment and asked a question ('What color is the car?'). In order to answer, the agent must first intelligently navigate to explore the environment, gather necessary visual information through first-person (egocentric) vision, and then answer the question ('orange'). EmbodiedQA requires a range of AI skills - language understanding, visual recognition, active perception, goal-driven navigation, commonsense reasoning, long-term memory, and grounding language into actions. In this work, we develop a dataset of questions and answers in House3D environments [1], evaluation metrics, and a hierarchical model trained with imitation and reinforcement learning. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/CVPR.2018.00008 | 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition |
Keywords | DocType | ISSN |
random location,first-person vision,visual recognition,goal-driven navigation,embodied question answering,house3D environments,commonsense reasoning,long-term memory,reinforcement learning,evaluation metrics,hierarchical model training,AI skills | Conference | 1063-6919 |
ISBN | Citations | PageRank |
978-1-5386-6421-6 | 2 | 0.40 |
References | Authors | |
5 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Abhishek Das | 1 | 433 | 23.54 |
Samyak Datta | 2 | 12 | 2.59 |
Georgia Gkioxari | 3 | 420 | 31.64 |
Stefan Lee | 4 | 231 | 19.88 |
Devi Parikh | 5 | 2929 | 132.01 |
Dhruv Batra | 6 | 2142 | 104.81 |