Title
Context-Aware Neural Voice Activity Detection Using Auxiliary Networks For Phoneme Recognition, Speech Enhancement And Acoustic Scene Classification
Abstract
This paper proposes a novel fully neural network based voice activity detection (VAD) method that estimates whether each speech segment is speech or non-speech even in very low signal-to-noise ratio (SNR) environments. Our innovation is to improve context-awareness of speech variability by introducing multiple auxiliary networks into the neural VAD framework. While previous studies reported that phonetic-aware auxiliary features extracted from a phoneme recognition network can improve VAD performance, none examined other effective auxiliary features for enhancing noise robustness. Thus, this paper present a neural VAD that uses auxiliary features extracted from not only the phoneme recognition network but also a speech enhancement network and an acoustic scene classification network. The last two networks are expected to improve context-awareness even in extremely low SNR environments since they can extract de-noised speech awareness and noisy environment awareness. In addition, we expect that combining these multiple auxiliary features yield synergistic improvements in VAD performance. Experiments verify the superiority of the proposed method in very low SNR environments.
Year
DOI
Venue
2019
10.23919/EUSIPCO.2019.8902703
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO)
Field
DocType
ISSN
Speech enhancement,Voice activity detection,Computer science,Speech recognition,Robustness (computer science),Artificial neural network,Phoneme recognition
Conference
2076-1465
Citations 
PageRank 
References 
0
0.34
0
Authors
6
Name
Order
Citations
PageRank
Ryo Masumura12528.24
Kiyoaki Matsui211.04
Koizumi Yuma34111.75
Takaaki Fukutomi410.70
Takanobu Oba500.34
Yushi Aono6711.02