Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts - Citegraph

Paper Info

Title
Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts

Abstract
Current state-of-the-art systems for visual content analysis require large training sets for each class of interest, and performance degrades rapidly with fewer examples. In this paper, we present a general framework for the zeroshot learning problem of performing high-level event detection with no training exemplars, using only textual descriptions. This task goes beyond the traditional zero-shot framework of adapting a given set of classes with training data to unseen classes. We leverage video and image collections with free-form text descriptions from widely available web sources to learn a large bank of concepts, in addition to using several off-the-shelf concept detectors, speech, and video text for representing videos. We utilize natural language processing technologies to generate event description features. The extracted features are then projected to a common high-dimensional space using text expansion, and similarity is computed in this space. We present extensive experimental results on the large TRECVID MED [26] corpus to demonstrate our approach. Our results show that the proposed concept detection methods significantly outperform current attribute classifiers such as Classemes [34], ObjectBank [21], and SUN attributes[28] . Further, we find that fusion, both within as well as between modalities, is crucial for optimal performance.

Year	DOI	Venue
2014	10.1109/CVPR.2014.341	CVPR
Keywords	Field	DocType
web sites,video event detection,high-level event detection,text expansion,zero-shot framework,zero-shot learning problem,zero-shot learning,multimodal fusion,training sets,trecvid med [26] corpus,extracted features,common high-dimensional space,feature extraction,weakly supervised concepts,event description features,textual descriptions,visual content analysis,natural language processing,web sources,zero-shot learning, video event detection, concept detection, multimodal fusion,video collections,image collections,zero-shot event detection,free-form text descriptions,concept detection,vectors,visualization,speech,support vector machines,detectors	Modalities,Computer science,Artificial intelligence,Detector,Computer vision,Content analysis,Pattern recognition,Visualization,TRECVID,Support vector machine,Feature extraction,Machine learning,Modal	Conference
ISSN	Citations	PageRank
1063-6919	43	1.12
References	Authors
28	5

Authors (5 rows)

Cited by (43 rows)

References (28 rows)

Name	Order	Citations	PageRank
Shuang Wu	1	171	7.23
Sravanthi Bondugula	2	43	1.79
Florian Luisier	3	43	1.12
Xiaodan Zhuang	4	433	24.71
Premkumar Natarajan	5	874	79.46

1