Residual attention-based LSTM for video captioning - Citegraph

Paper Info

Title
Residual attention-based LSTM for video captioning

Abstract
Recently great success has been achieved by proposing a framework with hierarchical LSTMs in video captioning, such as stacked LSTM networks. When deeper LSTM layers are able to start converging, a degradation problem has been exposed. With the number of LSTM layers increasing, accuracy gets saturated and then degrades rapidly like standard deep convolutional networks such as VGG. In this paper, we propose a novel attention-based framework, namely Residual Attention-based LSTM (Res-ATT), which not only takes advantage of existing attention mechanism but also considers the importance of sentence internal information which usually gets lost in the transmission process. Our key novelty is that we show how to integrate residual mapping into a hierarchical LSTM network to solve the degradation problem. More specifically, our novel hierarchical architecture builds on two LSTMs layers and residual mapping is introduced to avoid the loss of previous generated words information (i.e., both content information and relationship information). Experimental results on the mainstream datasets: MSVD and MSR-VTT, which shows that our framework outperforms the state-of-the-art approaches. Furthermore, our automatically generated sentences can provide more detailed information to precisely describe a video.

Year	DOI	Venue
2019	10.1007/s11280-018-0531-z	World Wide Web
Keywords	Field	DocType
LSTM, Attention mechanism, Residual thought, Video captioning	Residual,Closed captioning,Computer science,Degradation Problem,Artificial intelligence,Novelty,Sentence,Machine learning	Journal
Volume	Issue	ISSN
22	SP2	1573-1413
Citations	PageRank	References
7	0.43	31
Authors
4

Authors (4 rows)

Cited by (7 rows)

References (31 rows)

Name	Order	Citations	PageRank
Xiangpeng Li	1	99	16.53
Zhilong Zhou	2	8	1.12
Lijiang Chen	3	304	23.22
Lianli Gao	4	550	42.85

1