Unsupervised Audio Analysis For Categorizing Heterogeneous Consumer Domain Videos - Citegraph

Paper Info

Title
Unsupervised Audio Analysis For Categorizing Heterogeneous Consumer Domain Videos

Abstract
The ever increasing volume of consumer domain videos on the Internet has led to a surge in interest in automatically analyzing such content. The audio signal in these videos contains salient information, but applying current automatic speech recognition (ASR) techniques is not viable due to high variability, noise and multilingual content. We present two unsupervised techniques which do not rely on ASR to address these challenges. The first method involves learning an unsupervised codebook by clustering audio features, and the second involves directly matching low-level features using the pyramid match kernel (PMK). Experimental results on a approximate to 200 hour audio corpus downloaded from YouTube show that both our approaches significantly outperform the traditional approach of first segmenting the audio. stream to a set of mid-level classes (e.g. speech, non-speech, music, silence) and using the duration statistics of these classes to train high-level classifiers.

Year	Venue	Keywords
2011	12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5	Web Audio Retrieval, Pyramid Match Kernel, Audio Codebook Learning
Field	DocType	Citations
Computer science,Speech recognition,Audio analyzer	Conference	1
PageRank	References	Authors
0.38	1	5

Authors (5 rows)

Cited by (1 rows)

References (1 rows)

Name	Order	Citations	PageRank
Premkumar Natarajan	1	874	79.46
Stavros Tsakalidis	2	213	13.83
Vasant Manohar	3	299	16.18
Rohit Prasad	4	465	39.06
Premkumar Natarajan	5	5	1.47

1