Title
Linear-scale filterbank for deep neural network-based voice activity detection
Abstract
Voice activity detection (VAD) is an important preprocessing module in many speech applications. Choosing appropriate features and model structures is a significant challenge and an active area of current VAD research. Mel-scale features such as Mel-frequency cepstral coefficients (MFCCs) and log Mel-filterbank (LMFB) energies have been widely used in VAD as well as speech recognition. The reason for feature extraction in Mel- frequency scale to be one of the most popular methods is that it mimics how human ears process sound. However, for certain types of sound, in which important characteristics are reflected more in the high frequency range, a linear-scale in frequency may provide more information than the Mel- scale. Therefore, in this paper, we propose a deep neural network (DNN)-based VAD system using linear-scale feature. This study shows that the linear-scale feature, especially log linear-filterbank (LLFB) energy, can be used for the DNN-based VAD system and shows better performance than the LMFB for certain types of noise. Moreover, a combination of LMFB and LLFB can integrates both advantages of the two features.
Year
DOI
Venue
2017
10.1109/ICSDA.2017.8384446
2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)
Keywords
DocType
ISBN
Voice activity detection,deep neural network,linear-scale filterbank,Mel-scale filterbank,noise- independent training
Conference
978-1-5386-3334-2
Citations 
PageRank 
References 
1
0.36
8
Authors
4
Name
Order
Citations
PageRank
Youngmoon Jung134.42
Younggwan Kim2176.11
Hyungjun Lim3317.66
Hoirin Kim442.41