Learning Disentanglement with Decoupled Labels for Vision-Language Navigation. - Citegraph

Paper Info

Title
Learning Disentanglement with Decoupled Labels for Vision-Language Navigation.

Abstract
Vision-and-Language Navigation (VLN) requires an agent to follow complex natural language instructions and perceive the visual environment for real-world navigation. Intuitively, we find that instruction disentanglement for each viewpoint along the agent’s path is critical for accurate navigation. However, most methods only utilize the whole complex instruction or inaccurate sub-instructions due to the lack of accurate disentanglement as an intermediate supervision stage. To address this problem, we propose a new Disentanglement framework with Decoupled Labels (DDL) for VLN. Firstly, we manually extend the benchmark dataset Room-to-Room with landmark- and action-aware labels in order to provide fine-grained information for each viewpoint. Furthermore, to enhance the generalization ability, we propose a Decoupled Label Speaker module to generate pseudo-labels for augmented data and reinforcement training. To fully use the proposed fine-grained labels, we design a Disentangled Decoding Module to guide discriminative feature extraction and help alignment of multi-modalities. To reveal the generality of our proposed method, we apply it on a LSTM-based model and two recent Transformer-based models. Extensive experiments on two VLN benchmarks (i.e., R2R and R4R) demonstrate the effectiveness of our approach, achieving better performance than previous state-of-the-art methods.

Year	DOI	Venue
2022	10.1007/978-3-031-20059-5_18	European Conference on Computer Vision
Keywords	DocType	Citations
Vision-and-Language Navigation,Disentanglement,Modular network,Imitation/Reinforcement learning,LSTM and Transformer	Conference	0
PageRank	References	Authors
0.34	0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Wenhao Cheng	1	0	0.34
Xingping Dong	2	0	0.34
Salman Khan	3	387	41.05
Jianbing Shen	4	584	33.35

1