Video Captioning Via A Symmetric Bidirectional Decoder - Citegraph

Paper Info

Title
Video Captioning Via A Symmetric Bidirectional Decoder

Abstract
The dominant video captioning methods employ the attentional encoder-decoder architecture, where the decoder is an autoregressive structure that generates sentences from left-to-right. However, these methods generally suffer from the exposure bias issue and neglect the guidance of future output contexts obtained from the right-to-left decoding. Here, the authors propose a new symmetric bidirectional decoder for video captioning. The authors first integrate the self-attentive multi-head attention and bidirectional gated recurrent unit for capturing the long-term semantic dependencies in videos. The authors then apply one single decoder to generate accurate descriptions from left-to-right and right-to-left simultaneously. The decoder in each decoding direction performs two cross-attentive multi-head attention modules to consider both the past hidden states from the same decoding direction and the future hidden states from the reverse decoding direction at each time step. A symmetric semantic-guided gated attention module is specially devised to adaptively suppress the irrelevant or misleading contents in the past or future output contexts and retain the useful ones for avoiding under-description. Experimental evaluations on two widely applied benchmark datasets: Microsoft research video to text and Microsoft video description corpus, demonstrate that the authors' proposed method obtains substantially state-of-the-art performance, which validates the superiority of the bidirectional decoder.

Year	DOI	Venue
2021	10.1049/cvi2.12043	IET COMPUTER VISION
DocType	Volume	Issue
Journal	15	4
ISSN	Citations	PageRank
1751-9632	0	0.34
References	Authors
0	2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Shanshan Qi	1	0	0.34
Luxi Yang	2	1180	118.08

1