Abstract | ||
---|---|---|
Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments. In this paper, we study how to address three critical challenges for this task: the cross-modal grounding, the ill-posed feedback, and the generalization problems. First, we propose a novel Reinforced Cross-Modal Matching (RCM) approach that enforces ... |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/TPAMI.2020.2972281 | IEEE Transactions on Pattern Analysis and Machine Intelligence |
Keywords | DocType | Volume |
Navigation,Visualization,Trajectory,Task analysis,Cognition,Grounding,Natural languages | Journal | 43 |
Issue | ISSN | Citations |
12 | 0162-8828 | 0 |
PageRank | References | Authors |
0.34 | 7 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xin Wang | 1 | 0 | 0.34 |
Qiuyuan Huang | 2 | 176 | 17.66 |
Asli Çelikyilmaz | 3 | 407 | 39.06 |
Jianfeng Gao | 4 | 5729 | 296.43 |
Dinghan Shen | 5 | 108 | 10.37 |
Yuan-Fang Wang | 6 | 0 | 0.34 |
William Yang Wang | 7 | 493 | 59.64 |
Lei Zhang | 8 | 2533 | 164.29 |