Title
Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification.
Abstract
This paper presents a novel framework to combine multiple layers and modalities of deep neural networks for video classification. We first propose a multilayer strategy to simultaneously capture a variety of levels of abstraction and invariance in a network, where the convolutional and fully connected layers are effectively represented by our proposed feature aggregation methods. We further introduce a multimodal scheme that includes four highly complementary modalities to extract diverse static and dynamic cues at multiple temporal scales. In particular, for modeling the long-term temporal information, we propose a new structure, FC-RNN, to effectively transform pre-trained fully connected layers into recurrent layers. A robust boosting model is then introduced to optimize the fusion of multiple layers and modalities in a unified way. In the extensive experiments, we achieve state-of-the-art results on two public benchmark datasets: UCF101 and HMDB51.
Year
DOI
Venue
2016
10.1145/2964284.2964297
ACM Multimedia
Keywords
Field
DocType
Video Classification,Deep Neural Networks,Boosting,Fusion,CNN,RNN
Modalities,Temporal scales,Abstraction,Invariant (physics),Computer science,Fusion,Boosting (machine learning),Artificial intelligence,Feature aggregation,Deep neural networks,Machine learning
Conference
Citations 
PageRank 
References 
24
0.78
30
Authors
3
Name
Order
Citations
PageRank
Xiaodong Yang1109441.92
Pavlo O. Molchanov219811.96
Jan Kautz33615198.77