Title
Video-level Multi-model Fusion for Action Recognition
Abstract
The approaches based on spatio-temporal features for video action recognition have emerged such as two-stream based methods and 3D convolution based methods. However, current methods suffer from the problems caused by partial observation, or restricted to single information modeling, and so on. Segment-level recognition results obtained from dense sampling can not represent the entire video and, therefore lead to partial observation. And a single model is hard to capture the complementary information on spacial, temporal and spatio-temporal information from video at the same time. Therefore, the challenge is to build the video-level representation and capture multiple information. In this paper, a video-level multi-model fusion action recognition method is proposed to solve these problems. Firstly, an efficient video-level 3D convolution model is proposed to get the global information in the video which assembling segment-level 3D convolution models. Secondly, a multi-model fusion architecture is proposed for video action recognition to capture multiple information. The spatial, temporal and spatio-temporal information are aggregate with SVM classifier. Experimental results show that this method achieves the state-of-the-art performance on the datasets of UCF-101(97.6%) without pre-training on Kinetics.
Year
DOI
Venue
2019
10.1145/3357384.3357935
Proceedings of the 28th ACM International Conference on Information and Knowledge Management
Keywords
Field
DocType
3d convolution, action recognition, multi-model fusion, video-leval recognition
Information retrieval,Computer science,Action recognition,Fusion,Artificial intelligence,Machine learning
Conference
ISBN
Citations 
PageRank 
978-1-4503-6976-3
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Xiaomin Wang100.34
Junsan Zhang211.73
Leiquan Wang3487.63
Philip S. Yu4306703474.16
Jie Zhu500.34
Haisheng Li61010.14