Real-time Vision-Language-Navigation based on a Lite Pre-training Model - Citegraph

Paper Info

Title
Real-time Vision-Language-Navigation based on a Lite Pre-training Model

Abstract
Vision-Language-Navigation (VLN) is a challenging task that requires a robot to autonomously move to the destination based on visual observation following humans' natural language instructions. This paper presents a lite model based on the pre-training method, which can deal with real-time VLN task. Unlike previous traditional methods, our model achieves better performance and generalization thanks to adopting pre-training method. We introduce factorization and parameter sharing based on the PREVALENT model. These two lightweight approaches cause a 75% reduction of embedding parameters and a 77% reduction of the whole model parameters. About 17% of training time and 72.2% inference time are saved. At the same time, the performance of the original model was maintained, with a success rate (SR) and a success rate weighted by path length (SPL) consistent with the original model on the seen validation set (Seen Val) and a slight performance loss of about 1%-2% on the unseen validation set (Unseen Val).

Year	DOI	Venue
2020	10.1109/iThings-GreenCom-CPSCom-SmartData-Cybermatics50389.2020.00077	2020 International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics)
Keywords	DocType	ISBN
deep learning,natural language processing,real-time system	Conference	978-1-7281-7648-2
Citations	PageRank	References
0	0.34	0
Authors
7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Jitao Huang	1	0	0.34
Bo Huang	2	0	2.70
Liangqi Zhu	3	0	0.34
Liyuan Ma	4	0	0.34
Jin Liu	5	316	50.24
Guohui Zeng	6	0	0.34
Zhicai Shi	7	0	0.34

1