Title
Voice Activity Detection Based on Time-Delay Neural Networks
Abstract
Voice activity detection (VAD) is an important preprocessing part of many speech applications. Context information is important for VAD. Time-delay neural networks (TDNNs) capture long context information with a few parameters. This paper investigates a TDNN based VAD framework. A simple chunk based decision method is proposed to smooth raw posteriors and decide border points of utterances. To evaluate decision performance, a metric intersection-over-union (IoU) is introduced from image object detection. The experiment results are evaluated on Wall Street Journal (WSJ0) corpus. Frame classification performance is measured by area under the curve (AUC) and equal error rate (EER). Compared with long short-term memory baseline, the TDNN based system achieves a 41.26% EER relative reduction on average in matched noise condition, and relative improvement of average AUC is 3.82%. Proposed decision method achieves an 18.74% IoU relative improvement on average compared with moving average method on average.
Year
DOI
Venue
2019
10.1109/APSIPAASC47483.2019.9023262
2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Keywords
DocType
ISSN
voice activity detection,speech applications,time-delay neural networks capture long context information,VAD framework,smooth raw posteriors,decision performance,image object detection,Wall Street Journal corpus,equal error rate,TDNN based system,IoU relative improvement,EER relative reduction,chunk based decision method
Conference
2640-009X
ISBN
Citations 
PageRank 
978-1-7281-3249-5
1
0.35
References 
Authors
11
5
Name
Order
Citations
PageRank
Ye Bai175.52
Jiangyan Yi21917.99
Jianhua Tao3848138.00
Zhengqi Wen48624.41
Bin Liu519135.02