Title
Video Action Recognition With an Additional End-to-End Trained Temporal Stream
Abstract
Detecting actions in videos requires understanding the temporal relationships among frames. Typical action recognition approaches rely on optical flow estimation methods to convey temporal information to a CNN. Recent studies employ 3D convolutions in addition to optical flow to process the temporal information. While these models achieve slightly better results than two-stream 2D convolutional approaches, they are significantly more complex, requiring more data and time to be trained. We propose an efficient, adaptive batch size distributed training algorithm with customized optimizations for training the two 2D streams. We introduce a new 2D convolutional temporal stream that is trained end-to-end with a neural network. The flexibility to freeze some network layers from training in this temporal stream brings the possibility of ensemble learning with more than one temporal streams. Our architecture that combines three streams achieves the highest accuracies as we know of on UCF101 and HMDB51 by systems that do not pretrain on much larger datasets (e.g., Kinetics). We achieve these results while keeping our spatial and temporal streams 4.67x faster to train than the 3D convolution approaches.
Year
DOI
Venue
2019
10.1109/WACV.2019.00013
2019 IEEE Winter Conference on Applications of Computer Vision (WACV)
Keywords
Field
DocType
Three-dimensional displays,Streaming media,Two dimensional displays,Training,Optical imaging,Estimation,Computer architecture
Pattern recognition,Computer science,Convolution,End-to-end principle,Action recognition,Optical flow estimation,Artificial intelligence,Artificial neural network,Optical imaging,Optical flow,Ensemble learning
Conference
ISSN
ISBN
Citations 
2472-6737
978-1-7281-1975-5
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Guojing Cong135433.48
Giacomo Domeniconi201.01
Joshua Shapiro300.68
Chih-Chieh Yang412713.88
Barry Chen5253.42