Exploring Monaural Features for Classification-Based Speech Segregation - Citegraph

Paper Info

Title
Exploring Monaural Features for Classification-Based Speech Segregation

Abstract
Monaural speech segregation has been a very challenging problem for decades. By casting speech segregation as a binary classification problem, recent advances have been made in computational auditory scene analysis on segregation of both voiced and unvoiced speech. So far, pitch and amplitude modulation spectrogram have been used as two main kinds of time-frequency (T-F) unit level features in classification. In this paper, we expand T-F unit features to include gammatone frequency cepstral coefficients (GFCC), mel-frequency cepstral coefficients, relative spectral transform (RASTA) and perceptual linear prediction (PLP). Comprehensive comparisons are performed in order to identify effective features for classification-based speech segregation. Our experiments in matched and unmatched test conditions show that these newly included features significantly improve speech segregation performance. Specifically, GFCC and RASTA-PLP are the best single features in matched-noise and unmatched-noise test conditions, respectively. We also find that pitch-based features are crucial for good generalization to unseen environments. To further explore complementarity in terms of discriminative power, we propose to use a group Lasso approach to select complementary features in a principled way. The final combined feature set yields promising results in both matched and unmatched test conditions.

Year	DOI	Venue
2013	10.1109/TASL.2012.2221459	IEEE Transactions on Audio, Speech, and Language Processing
Keywords	Field	DocType
binary classification,voiced speech,speech processing,relative spectral transform,gammatone frequency cepstral coefficient,unvoiced speech,monaural feature exploration,gfcc,binary classification problem,amplitude modulation spectrogram,mel-frequency cepstral coefficient,time-frequency unit level feature,perceptual linear prediction,t-f unit level feature,monaural speech segregation,cepstral analysis,feature combination,matched testing,computational auditory scene analysis,amplitude modulation,group lasso approach,unmatched-noise test condition,plp,group lasso,classification-based speech segregation,pitch modulation spectrogram,computational auditory scene analysis (casa),time-frequency analysis,rasta,time frequency analysis	Mel-frequency cepstrum,Speech processing,Pattern recognition,Binary classification,Spectrogram,Computer science,Speech recognition,Artificial intelligence,Time–frequency analysis,Discriminative model,Monaural,Computational auditory scene analysis	Journal
Volume	Issue	ISSN
21	2	1558-7916
Citations	PageRank	References
50	2.47	16
Authors
3

Authors (3 rows)

Cited by (50 rows)

References (16 rows)

Name	Order	Citations	PageRank
Yu-Xuan Wang	1	650	32.68
Kun Han	2	161	8.43
DeLiang Wang	3	3933	362.87

1