Abstract | ||
---|---|---|
The ever increasing volume of consumer domain videos on the Internet has led to a surge in interest in automatically analyzing such content. The audio signal in these videos contains salient information, but applying current automatic speech recognition (ASR) techniques is not viable due to high variability, noise and multilingual content. We present two unsupervised techniques which do not rely on ASR to address these challenges. The first method involves learning an unsupervised codebook by clustering audio features, and the second involves directly matching low-level features using the pyramid match kernel (PMK). Experimental results on a approximate to 200 hour audio corpus downloaded from YouTube show that both our approaches significantly outperform the traditional approach of first segmenting the audio. stream to a set of mid-level classes (e.g. speech, non-speech, music, silence) and using the duration statistics of these classes to train high-level classifiers. |
Year | Venue | Keywords |
---|---|---|
2011 | 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | Web Audio Retrieval, Pyramid Match Kernel, Audio Codebook Learning |
Field | DocType | Citations |
Computer science,Speech recognition,Audio analyzer | Conference | 1 |
PageRank | References | Authors |
0.38 | 1 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Premkumar Natarajan | 1 | 874 | 79.46 |
Stavros Tsakalidis | 2 | 213 | 13.83 |
Vasant Manohar | 3 | 299 | 16.18 |
Rohit Prasad | 4 | 465 | 39.06 |
Premkumar Natarajan | 5 | 5 | 1.47 |