Hierarchical Deep Recurrent Architecture for Video Understanding. - Citegraph

Paper Info

Title
Hierarchical Deep Recurrent Architecture for Video Understanding.

Abstract
This paper introduces the system we developed for the Youtube-8M Video Understanding Challenge, in which a large-scale benchmark dataset was used for multi-label video classification. The proposed framework contains hierarchical deep architecture, including the frame-level sequence modeling part and the video-level classification part. In the frame-level sequence modelling part, we explore a set of methods including Pooling-LSTM (PLSTM), Hierarchical-LSTM (HLSTM), Random-LSTM (RLSTM) in order to address the problem of large amount of frames in a video. We also introduce two attention pooling methods, single attention pooling (ATT) and multiply attention pooling (Multi-ATT) so that we can pay more attention to the informative frames in a video and ignore the useless frames. In the video-level classification part, two methods are proposed to increase the classification performance, i.e. Hierarchical-Mixture-of-Experts (HMoE) and Classifier Chains (CC). Our final submission is an ensemble consisting of 18 sub-models. In terms of the official evaluation metric Global Average Precision (GAP) at 20, our best submission achieves 0.84346 on the public 50% of test dataset and 0.84333 on the private 50% of test data.

Year	Venue	Field
2017	arXiv: Computer Vision and Pattern Recognition	Classifier chains,Architecture,Computer science,Pooling,Sequence modeling,Artificial intelligence,Test data,Machine learning
DocType	Volume	Citations
Journal	abs/1707.03296	0
PageRank	References	Authors
0.34	6	4

Authors (4 rows)

Cited by (0 rows)

References (6 rows)

Name	Order	Citations	PageRank
Luming Tang	1	20	2.05
Boyang Deng	2	27	2.84
Haiyu Zhao	3	65	6.28
Shuai Yi	4	167	14.21

1