Abstract | ||
---|---|---|
Vision-Language-Navigation (VLN) is a challenging task that requires a robot to autonomously move to the destination based on visual observation following humans' natural language instructions. This paper presents a lite model based on the pre-training method, which can deal with real-time VLN task. Unlike previous traditional methods, our model achieves better performance and generalization thanks to adopting pre-training method. We introduce factorization and parameter sharing based on the PREVALENT model. These two lightweight approaches cause a 75% reduction of embedding parameters and a 77% reduction of the whole model parameters. About 17% of training time and 72.2% inference time are saved. At the same time, the performance of the original model was maintained, with a success rate (SR) and a success rate weighted by path length (SPL) consistent with the original model on the seen validation set (Seen Val) and a slight performance loss of about 1%-2% on the unseen validation set (Unseen Val). |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/iThings-GreenCom-CPSCom-SmartData-Cybermatics50389.2020.00077 | 2020 International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics) |
Keywords | DocType | ISBN |
deep learning,natural language processing,real-time system | Conference | 978-1-7281-7648-2 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jitao Huang | 1 | 0 | 0.34 |
Bo Huang | 2 | 0 | 2.70 |
Liangqi Zhu | 3 | 0 | 0.34 |
Liyuan Ma | 4 | 0 | 0.34 |
Jin Liu | 5 | 316 | 50.24 |
Guohui Zeng | 6 | 0 | 0.34 |
Zhicai Shi | 7 | 0 | 0.34 |