Title
Unsupervised Audio Analysis For Categorizing Heterogeneous Consumer Domain Videos
Abstract
The ever increasing volume of consumer domain videos on the Internet has led to a surge in interest in automatically analyzing such content. The audio signal in these videos contains salient information, but applying current automatic speech recognition (ASR) techniques is not viable due to high variability, noise and multilingual content. We present two unsupervised techniques which do not rely on ASR to address these challenges. The first method involves learning an unsupervised codebook by clustering audio features, and the second involves directly matching low-level features using the pyramid match kernel (PMK). Experimental results on a approximate to 200 hour audio corpus downloaded from YouTube show that both our approaches significantly outperform the traditional approach of first segmenting the audio. stream to a set of mid-level classes (e.g. speech, non-speech, music, silence) and using the duration statistics of these classes to train high-level classifiers.
Year
Venue
Keywords
2011
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5
Web Audio Retrieval, Pyramid Match Kernel, Audio Codebook Learning
Field
DocType
Citations 
Computer science,Speech recognition,Audio analyzer
Conference
1
PageRank 
References 
Authors
0.38
1
5
Name
Order
Citations
PageRank
Premkumar Natarajan187479.46
Stavros Tsakalidis221313.83
Vasant Manohar329916.18
Rohit Prasad446539.06
Premkumar Natarajan551.47