Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor - Citegraph

Paper Info

Title
Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor

Abstract
Event detection plays an essential role in video content analysis and remains a challenging open problem. In particular, the study on detecting human-related video events in complex scenes with both a crowd of people and dynamic motion is still limited. In this paper, we investigate detecting video events that involve elementary human actions, e.g. making cellphone call, putting an object down, and pointing to something, in complex scenes using a novel spatio-temporal descriptor based approach. A new spatio-temporal descriptor, which temporally integrates the statistics of a set of response maps of low-level features, e.g. image gradients and optical flows, in a space-time cube, is proposed to capture the characteristics of actions in terms of their appearance and motion patterns. Based on this kind of descriptors, the bag-of-words method is utilized to describe a human figure as a concise feature vector. Then, these features are employed to train SVM classifiers at multiple spatial pyramid levels to distinguish different actions. Finally, a Gaussian kernel based temporal filtering is conducted to segment the sequences of events from a video stream taking account of the temporal consistency of actions. The proposed approach is capable of tolerating spatial layout variations and local deformations of human actions due to diverse view angles and rough human figure alignment in complex scenes. Extensive experiments on the 50-hour video dataset of TRECVid 2008 event detection task demonstrate that our approach outperforms the well-known SIFT descriptor based methods and effectively detects video events in challenging real-world conditions.

Year	DOI	Venue
2009	10.1145/1631272.1631297	ACM Multimedia 2001
Keywords	Field	DocType
detecting video event,human figure,human-related video event,video event,detects video event,elementary human action,human action,50-hour video dataset,action recognition,complex scene,video stream,spatio-temporal descriptor,video content analysis,optical flow,bag of words,gaussian kernel,feature vector,space time	Scale-invariant feature transform,Computer vision,Feature vector,Pattern recognition,Computer science,TRECVID,Support vector machine,Filter (signal processing),Video tracking,Video content analysis,Artificial intelligence,Gaussian function	Conference
Citations	PageRank	References
33	1.73	29
Authors
5

Authors (5 rows)

Cited by (33 rows)

References (29 rows)

Name	Order	Citations	PageRank
zhao zhao	1	830	44.72
Ming Yang	2	3471	162.50
Yu, Kai	3	4799	255.21
wei xu	4	3533	207.17
yihong gong	5	7300	470.57

1