Title
Spatio-temporal fisher vector coding for surveillance event detection
Abstract
We present a generic event detection system evaluated in the Surveillance Event Detection (SED) task of TRECVID 2012. We investigate a statistical approach with spatio-temporal features applied to seven event classes, which were defined by the SED task. This approach is based on local spatio-temporal descriptors, called MoSIFT and generated by pair-wise video frames. A Gaussian Mixture Model(GMM) is learned to model the distribution of the low level features. Then for each sliding window, the Fisher vector encoding [improvedFV] is used to generate the sample representation. The model is learnt using a Linear SVM for each event. The main novelty of our system is the introduction of Fisher vector encoding into video event detection. Fisher vector encoding has demonstrated great success in image classification. The key idea is to model the low level visual features as a Gaussian Mixture Model and to generate an intermediate vector representation for bag of features. FV encoding uses higher order statistics in place of histograms in the standard BoW. FV has several good properties: (a) it can naturally separate the video specific information from the noisy local features and (b) we can use a linear model for this representation. We build an efficient implementation for FV encoding which can attain a 10 times speed-up over real-time. We also take advantage of non-trivial object localization techniques to feed into the video event detection, e.g. multi-scale detection and non-maximum suppression. This approach outperformed the results of all other teams submissions in TRECVID SED 2012 on four of the seven event types.
Year
DOI
Venue
2013
10.1145/2502081.2502155
ACM Multimedia 2001
Keywords
Field
DocType
gaussian mixture model,linear model,surveillance event detection,video event detection,fv encoding,fisher vector encoding,intermediate vector representation,spatio-temporal fisher vector,generic event detection system,event type,event class,fisher vector,system
Histogram,Computer science,Artificial intelligence,Contextual image classification,Computer vision,Sliding window protocol,Pattern recognition,Linear model,TRECVID,Higher-order statistics,Speech recognition,Mixture model,Encoding (memory)
Conference
Citations 
PageRank 
References 
11
0.53
15
Authors
9
Name
Order
Citations
PageRank
Qiang Chen144034.87
Yang Cai212113.03
Lisa Brown3110.53
Ankur Datta41077.96
Quanfu Fan550432.69
Rogério Feris6152989.95
Shuicheng Yan79701359.54
Alexander G. Hauptmann87472558.23
Sharath Pankanti9110.53