Title | ||
---|---|---|
Context-Aware Neural Voice Activity Detection Using Auxiliary Networks For Phoneme Recognition, Speech Enhancement And Acoustic Scene Classification |
Abstract | ||
---|---|---|
This paper proposes a novel fully neural network based voice activity detection (VAD) method that estimates whether each speech segment is speech or non-speech even in very low signal-to-noise ratio (SNR) environments. Our innovation is to improve context-awareness of speech variability by introducing multiple auxiliary networks into the neural VAD framework. While previous studies reported that phonetic-aware auxiliary features extracted from a phoneme recognition network can improve VAD performance, none examined other effective auxiliary features for enhancing noise robustness. Thus, this paper present a neural VAD that uses auxiliary features extracted from not only the phoneme recognition network but also a speech enhancement network and an acoustic scene classification network. The last two networks are expected to improve context-awareness even in extremely low SNR environments since they can extract de-noised speech awareness and noisy environment awareness. In addition, we expect that combining these multiple auxiliary features yield synergistic improvements in VAD performance. Experiments verify the superiority of the proposed method in very low SNR environments. |
Year | DOI | Venue |
---|---|---|
2019 | 10.23919/EUSIPCO.2019.8902703 | 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) |
Field | DocType | ISSN |
Speech enhancement,Voice activity detection,Computer science,Speech recognition,Robustness (computer science),Artificial neural network,Phoneme recognition | Conference | 2076-1465 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ryo Masumura | 1 | 25 | 28.24 |
Kiyoaki Matsui | 2 | 1 | 1.04 |
Koizumi Yuma | 3 | 41 | 11.75 |
Takaaki Fukutomi | 4 | 1 | 0.70 |
Takanobu Oba | 5 | 0 | 0.34 |
Yushi Aono | 6 | 7 | 11.02 |