Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification. - Citegraph

Paper Info

Title
Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification.

Abstract
This paper presents a novel framework to combine multiple layers and modalities of deep neural networks for video classification. We first propose a multilayer strategy to simultaneously capture a variety of levels of abstraction and invariance in a network, where the convolutional and fully connected layers are effectively represented by our proposed feature aggregation methods. We further introduce a multimodal scheme that includes four highly complementary modalities to extract diverse static and dynamic cues at multiple temporal scales. In particular, for modeling the long-term temporal information, we propose a new structure, FC-RNN, to effectively transform pre-trained fully connected layers into recurrent layers. A robust boosting model is then introduced to optimize the fusion of multiple layers and modalities in a unified way. In the extensive experiments, we achieve state-of-the-art results on two public benchmark datasets: UCF101 and HMDB51.

Year	DOI	Venue
2016	10.1145/2964284.2964297	ACM Multimedia
Keywords	Field	DocType
Video Classification,Deep Neural Networks,Boosting,Fusion,CNN,RNN	Modalities,Temporal scales,Abstraction,Invariant (physics),Computer science,Fusion,Boosting (machine learning),Artificial intelligence,Feature aggregation,Deep neural networks,Machine learning	Conference
Citations	PageRank	References
24	0.78	30
Authors
3

Authors (3 rows)

Cited by (24 rows)

References (30 rows)

Name	Order	Citations	PageRank
Xiaodong Yang	1	1094	41.92
Pavlo O. Molchanov	2	198	11.96
Jan Kautz	3	3615	198.77

1