Title
Real-time Vision-Language-Navigation based on a Lite Pre-training Model
Abstract
Vision-Language-Navigation (VLN) is a challenging task that requires a robot to autonomously move to the destination based on visual observation following humans' natural language instructions. This paper presents a lite model based on the pre-training method, which can deal with real-time VLN task. Unlike previous traditional methods, our model achieves better performance and generalization thanks to adopting pre-training method. We introduce factorization and parameter sharing based on the PREVALENT model. These two lightweight approaches cause a 75% reduction of embedding parameters and a 77% reduction of the whole model parameters. About 17% of training time and 72.2% inference time are saved. At the same time, the performance of the original model was maintained, with a success rate (SR) and a success rate weighted by path length (SPL) consistent with the original model on the seen validation set (Seen Val) and a slight performance loss of about 1%-2% on the unseen validation set (Unseen Val).
Year
DOI
Venue
2020
10.1109/iThings-GreenCom-CPSCom-SmartData-Cybermatics50389.2020.00077
2020 International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics)
Keywords
DocType
ISBN
deep learning,natural language processing,real-time system
Conference
978-1-7281-7648-2
Citations 
PageRank 
References 
0
0.34
0
Authors
7
Name
Order
Citations
PageRank
Jitao Huang100.34
Bo Huang202.70
Liangqi Zhu300.34
Liyuan Ma400.34
Jin Liu531650.24
Guohui Zeng600.34
Zhicai Shi700.34