Title
Lost in segmentation: Three approaches for speech/non-speech detection in consumer-produced videos
Abstract
Traditional speech/non-speech segmentation systems have been designed for specific acoustic conditions, such as broadcast news or meetings. However, little research has been done on consumer-produced audio. This type of media is constantly growing and has complex characteristics such as low quality recordings, environmental noise and overlapping sounds. This paper discusses an evaluation of three different approaches for speech/non-speech detection on consumer-produced audio. The approaches are state-of-the-art speech/non-speech detectors-one based on Gaussian Mixture Models (GMM), another on Support Vector Machines (SVM), and the last on Neural Networks (NN). Using the TRECVID MED 2012 database, we designed training/testing sets combinations to aid the understanding of what speech/non-speech detection on consumer-produced media entails and how traditional approaches to this detection performed in this domain. The results revealed that the cross-domain state-of-the-art GMM and SVM systems' tests underperformed a one-layer NN algorithm, which had 20% higher accuracy and computed audio 5 times faster.
Year
DOI
Venue
2013
10.1109/ICME.2013.6607486
Multimedia and Expo
Keywords
Field
DocType
Gaussian processes,audio databases,audio signal processing,neural nets,speech recognition,support vector machines,video signal processing,GMM,Gaussian mixture models,SVM system testing,TRECVID MED 2012 database,acoustic conditions,complex characteristics,consumer-produced audio,consumer-produced media,consumer-produced videos,neural networks,nonspeech detection,one-layer NN algorithm,speech detection,support vector machines,audio segmentation,gmm,speech non-speech,svm,user-generated content neural networks
Speech coding,Pattern recognition,Computer science,Audio mining,Voice activity detection,TRECVID,Support vector machine,Speech recognition,Artificial intelligence,Audio signal processing,Mixture model,Acoustic model
Conference
ISSN
Citations 
PageRank 
1945-7871
2
0.40
References 
Authors
4
2
Name
Order
Citations
PageRank
Benjamin Elizalde135922.38
Gerald Friedland2112796.23