Title
Exploiting Objects with LSTMs for Video Categorization.
Abstract
Temporal dynamics play an important role for video classification. In this paper, we propose to leverage high-level semantic features to open the \"black box\" of the state-of-the-art temporal model, Long Short Term Memory (LSTM), with an aim to understand what is learned. More specifically, we first extract object features from a state-of-the-art CNN model that is trained to recognize 20K objects. Then we leverage LSTM with the extracted features as inputs to capture the temporal dynamics in videos. In combination with spatial and motion information, we achieve improvements for supervised video categorization. Furthermore, by masking the inputs, we demonstrate what is learned by LSTM, namely (i) which objects are crucial for recognizing a class-of-interest; (ii) how the LSTM model could assist the temporal localization of these detected objects.
Year
DOI
Venue
2016
10.1145/2964284.2967199
ACM Multimedia
Field
DocType
Citations 
Black box (phreaking),Computer vision,Categorization,Masking (art),Computer science,Long short term memory,Artificial intelligence,Deep learning,Machine learning
Conference
5
PageRank 
References 
Authors
0.41
14
6
Name
Order
Citations
PageRank
Yongqing Sun163.46
Zuxuan Wu249629.79
Xi Wang355324.14
Hiroyuki Arai46413.25
Tetsuya Kinebuchi593.17
Yu-Gang Jiang63071152.58