Title
Temporal Attention Feature Encoding for Video Captioning.
Abstract
In this paper, we propose a novel video captioning algorithm including a feature encoder (FENC) and a decoder architecture to provide more accurate and richer representation. Our network model incorporates feature temporal attention (FTA) to efficiently embed important events to a feature vector. In FTA, the proposed feature is given as the weighted fusion of the video features extracted from 3D CNN, and, therefore it allows a decoder to know when the feature is activated. In a decoder, similarly, a feature word attention (FWA) is used for weighting some elements of the encoded feature vector. The FWA determines which elements in the feature should be activated to generate the appropriate word. The training is further facilitated by a new loss function, reducing the variance of the frequencies of words. It is demonstrated with experimental results that the proposed algorithms outperforms the conventional algorithms in VATEX that is a recent large-scale dataset for long-term video sentence generation.
Year
Venue
DocType
2020
APSIPA
Conference
Citations 
PageRank 
References 
0
0.34
0
Authors
3
Name
Order
Citations
PageRank
Na-Young Kim101.35
Seong Jong Ha200.34
Je-Won Kang396.87