Title
Relevance filtering meets active learning: improving web-based concept detectors
Abstract
We address the challenge of training visual concept detectors on web video as available from portals such as YouTube. In contrast to high-quality but small manually acquired training sets, this setup permits us to scale up concept detection to very large training sets and concept vocabularies. On the downside, web tags are only weak indicators of concept presence, and web video training data contains lots of non-relevant content. So far, there are two general strategies to overcome this label noise problem, both targeted at discarding non-relevant training content: (1) a manual refinement supported by active learning sample selection, (2) an automatic refinement using relevance filtering. In this paper, we present a highly efficient approach combining these two strategies in an interleaved setup: manually refined samples are directly used to improve relevance filtering, which again provides a good basis for the next active learning sample selection. Our results demonstrate that the proposed combination -- called active relevance filtering -- outperforms both a purely automatic filtering and a manual one based on active learning. For example, by using 50 manual labels per concept, an improvement of 5% over an automatic filtering is achieved, and 6% over active learning. By annotating only 25% of weak positive samples in the training set, a performance comparable to training on ground truth labels is reached.
Year
DOI
Venue
2010
10.1145/1743384.1743397
Multimedia Information Retrieval
Keywords
Field
DocType
active relevance,training set,non-relevant training content,active learning sample selection,active learning,concept detection,concept vocabulary,web video training data,web-based concept detector,concept presence,large training set,ground truth
Data mining,Computer science,Artificial intelligence,Web application,Detector,Training set,Active learning,Information retrieval,Pattern recognition,Filter (signal processing),Ground truth,Sample selection,Machine learning
Conference
Citations 
PageRank 
References 
5
0.52
27
Authors
3
Name
Order
Citations
PageRank
Damian Borth176449.45
Adrian Ulges232826.61
Thomas M. Breuel32362219.10