Voice Activity Detection Based on Time-Delay Neural Networks - Citegraph

Paper Info

Title
Voice Activity Detection Based on Time-Delay Neural Networks

Abstract
Voice activity detection (VAD) is an important preprocessing part of many speech applications. Context information is important for VAD. Time-delay neural networks (TDNNs) capture long context information with a few parameters. This paper investigates a TDNN based VAD framework. A simple chunk based decision method is proposed to smooth raw posteriors and decide border points of utterances. To evaluate decision performance, a metric intersection-over-union (IoU) is introduced from image object detection. The experiment results are evaluated on Wall Street Journal (WSJ0) corpus. Frame classification performance is measured by area under the curve (AUC) and equal error rate (EER). Compared with long short-term memory baseline, the TDNN based system achieves a 41.26% EER relative reduction on average in matched noise condition, and relative improvement of average AUC is 3.82%. Proposed decision method achieves an 18.74% IoU relative improvement on average compared with moving average method on average.

Year	DOI	Venue
2019	10.1109/APSIPAASC47483.2019.9023262	2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Keywords	DocType	ISSN
voice activity detection,speech applications,time-delay neural networks capture long context information,VAD framework,smooth raw posteriors,decision performance,image object detection,Wall Street Journal corpus,equal error rate,TDNN based system,IoU relative improvement,EER relative reduction,chunk based decision method	Conference	2640-009X
ISBN	Citations	PageRank
978-1-7281-3249-5	1	0.35
References	Authors
11	5

Authors (5 rows)

Cited by (1 rows)

References (11 rows)

Name	Order	Citations	PageRank
Ye Bai	1	7	5.52
Jiangyan Yi	2	19	17.99
Jianhua Tao	3	848	138.00
Zhengqi Wen	4	86	24.41
Bin Liu	5	191	35.02

1