Title
Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor
Abstract
Event detection plays an essential role in video content analysis and remains a challenging open problem. In particular, the study on detecting human-related video events in complex scenes with both a crowd of people and dynamic motion is still limited. In this paper, we investigate detecting video events that involve elementary human actions, e.g. making cellphone call, putting an object down, and pointing to something, in complex scenes using a novel spatio-temporal descriptor based approach. A new spatio-temporal descriptor, which temporally integrates the statistics of a set of response maps of low-level features, e.g. image gradients and optical flows, in a space-time cube, is proposed to capture the characteristics of actions in terms of their appearance and motion patterns. Based on this kind of descriptors, the bag-of-words method is utilized to describe a human figure as a concise feature vector. Then, these features are employed to train SVM classifiers at multiple spatial pyramid levels to distinguish different actions. Finally, a Gaussian kernel based temporal filtering is conducted to segment the sequences of events from a video stream taking account of the temporal consistency of actions. The proposed approach is capable of tolerating spatial layout variations and local deformations of human actions due to diverse view angles and rough human figure alignment in complex scenes. Extensive experiments on the 50-hour video dataset of TRECVid 2008 event detection task demonstrate that our approach outperforms the well-known SIFT descriptor based methods and effectively detects video events in challenging real-world conditions.
Year
DOI
Venue
2009
10.1145/1631272.1631297
ACM Multimedia 2001
Keywords
Field
DocType
detecting video event,human figure,human-related video event,video event,detects video event,elementary human action,human action,50-hour video dataset,action recognition,complex scene,video stream,spatio-temporal descriptor,video content analysis,optical flow,bag of words,gaussian kernel,feature vector,space time
Scale-invariant feature transform,Computer vision,Feature vector,Pattern recognition,Computer science,TRECVID,Support vector machine,Filter (signal processing),Video tracking,Video content analysis,Artificial intelligence,Gaussian function
Conference
Citations 
PageRank 
References 
33
1.73
29
Authors
5
Name
Order
Citations
PageRank
zhao zhao183044.72
Ming Yang23471162.50
Yu, Kai34799255.21
wei xu43533207.17
yihong gong57300470.57