Title
Residual attention-based LSTM for video captioning
Abstract
Recently great success has been achieved by proposing a framework with hierarchical LSTMs in video captioning, such as stacked LSTM networks. When deeper LSTM layers are able to start converging, a degradation problem has been exposed. With the number of LSTM layers increasing, accuracy gets saturated and then degrades rapidly like standard deep convolutional networks such as VGG. In this paper, we propose a novel attention-based framework, namely Residual Attention-based LSTM (Res-ATT), which not only takes advantage of existing attention mechanism but also considers the importance of sentence internal information which usually gets lost in the transmission process. Our key novelty is that we show how to integrate residual mapping into a hierarchical LSTM network to solve the degradation problem. More specifically, our novel hierarchical architecture builds on two LSTMs layers and residual mapping is introduced to avoid the loss of previous generated words information (i.e., both content information and relationship information). Experimental results on the mainstream datasets: MSVD and MSR-VTT, which shows that our framework outperforms the state-of-the-art approaches. Furthermore, our automatically generated sentences can provide more detailed information to precisely describe a video.
Year
DOI
Venue
2019
10.1007/s11280-018-0531-z
World Wide Web
Keywords
Field
DocType
LSTM, Attention mechanism, Residual thought, Video captioning
Residual,Closed captioning,Computer science,Degradation Problem,Artificial intelligence,Novelty,Sentence,Machine learning
Journal
Volume
Issue
ISSN
22
SP2
1573-1413
Citations 
PageRank 
References 
7
0.43
31
Authors
4
Name
Order
Citations
PageRank
Xiangpeng Li19916.53
Zhilong Zhou281.12
Lijiang Chen330423.22
Lianli Gao455042.85