Title
Exploring Monaural Features for Classification-Based Speech Segregation
Abstract
Monaural speech segregation has been a very challenging problem for decades. By casting speech segregation as a binary classification problem, recent advances have been made in computational auditory scene analysis on segregation of both voiced and unvoiced speech. So far, pitch and amplitude modulation spectrogram have been used as two main kinds of time-frequency (T-F) unit level features in classification. In this paper, we expand T-F unit features to include gammatone frequency cepstral coefficients (GFCC), mel-frequency cepstral coefficients, relative spectral transform (RASTA) and perceptual linear prediction (PLP). Comprehensive comparisons are performed in order to identify effective features for classification-based speech segregation. Our experiments in matched and unmatched test conditions show that these newly included features significantly improve speech segregation performance. Specifically, GFCC and RASTA-PLP are the best single features in matched-noise and unmatched-noise test conditions, respectively. We also find that pitch-based features are crucial for good generalization to unseen environments. To further explore complementarity in terms of discriminative power, we propose to use a group Lasso approach to select complementary features in a principled way. The final combined feature set yields promising results in both matched and unmatched test conditions.
Year
DOI
Venue
2013
10.1109/TASL.2012.2221459
IEEE Transactions on Audio, Speech, and Language Processing
Keywords
Field
DocType
binary classification,voiced speech,speech processing,relative spectral transform,gammatone frequency cepstral coefficient,unvoiced speech,monaural feature exploration,gfcc,binary classification problem,amplitude modulation spectrogram,mel-frequency cepstral coefficient,time-frequency unit level feature,perceptual linear prediction,t-f unit level feature,monaural speech segregation,cepstral analysis,feature combination,matched testing,computational auditory scene analysis,amplitude modulation,group lasso approach,unmatched-noise test condition,plp,group lasso,classification-based speech segregation,pitch modulation spectrogram,computational auditory scene analysis (casa),time-frequency analysis,rasta,time frequency analysis
Mel-frequency cepstrum,Speech processing,Pattern recognition,Binary classification,Spectrogram,Computer science,Speech recognition,Artificial intelligence,Time–frequency analysis,Discriminative model,Monaural,Computational auditory scene analysis
Journal
Volume
Issue
ISSN
21
2
1558-7916
Citations 
PageRank 
References 
50
2.47
16
Authors
3
Name
Order
Citations
PageRank
Yu-Xuan Wang165032.68
Kun Han21618.43
DeLiang Wang33933362.87