Title
Robust Speech Activity Detection In Movie Audio: Data Resources And Experimental Evaluation
Abstract
Speech activity detection in highly variable acoustic conditions is a challenging task. Many approaches to detect speech activity in such conditions involve an inherent knowledge of the noise types involved. Movie audio can offer an excellent research test-bed for developing speech activity models. A robust speech detection in movie audio is also a crucial step for subsequent content analyses such as audio diarization. Obtaining labels for supervision of such data can be very expensive, and may not be scalable. In this paper, we employ a simple, yet effective approach to obtain speech labels for movie data by coarse aligning the subtitles with movie audio. We compiled a dataset, called Subtitle-aligned Movie Corpus (SAM) of nearly 23 hours of data labelled as speech from ninety-five Hollywood movies. We propose convolutional neural network architectures that use log-mel spectrograms as input features to predict speech at a segment-level, as opposed to frame-level. We show that our models trained on SAM outperform existing baselines on two independent, publicly released movie speech datasets. We have made the SAM corpus and pretrained models publicly available for further research.
Year
DOI
Venue
2019
10.1109/icassp.2019.8682532
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Keywords
Field
DocType
Speech activity detection, movie audio, convolutional neural networks
Data modeling,Pattern recognition,Task analysis,Convolutional neural network,Computer science,Voice activity detection,Spectrogram,Feature extraction,Speech recognition,Speaker diarisation,Artificial intelligence,Scalability
Conference
ISSN
Citations 
PageRank 
1520-6149
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Rajat Hebbar100.68
Krishna S.298.31
Narayanan Shrikanth35558439.23