Title
Mask classification for missing-feature reconstruction for robust speech recognition in unknown background noise
Abstract
''Missing-feature'' techniques to improve speech recognition accuracy are based on the blind determination of which cells in a spectrogram-like display of speech are corrupted by the effects of noise or other types of disturbance (and hence are ''missing''). In this paper we present three new approaches that improve the speech recognition accuracy obtained using missing-feature techniques. It had been found in previous studies (e.g. Seltzer et al., 2004) that Bayesian approaches to missing-feature classification are effective in ameliorating the effects of various types of additive noise. While Seltzer et al. primarily used white noise for training their Bayesian classifier, we have found that this is not the best type of training signal when noise with greater spectral and/or temporal variation is encountered in the testing environment. The first innovation introduced in this paper, referred to as frequency-dependent classification, involves independent classification in each of the various frequency bands in which the incoming speech is analyzed based on parallel sets of frequency-dependent features. The second innovation, referred to as colored-noise generation using multi-band partitioning, involves the use of masking noises with artificially-introduced spectral and temporal variation in training the Bayesian classifier used to determine which spectro-temporal components of incoming speech are corrupted by noise in unknown testing environments. The third innovation consists of an adaptive method to estimate the a priori values of the mask classifier that determines whether a particular time-frequency segment of the test data should be considered to be reliable or not. It is shown that these innovations provide improved speech recognition accuracy on a small vocabulary test when missing-feature restoration is applied to incoming speech that is corrupted by additive noise of an unknown nature, especially at lower signal-to-noise ratios.
Year
DOI
Venue
2011
10.1016/j.specom.2010.08.005
Speech Communication
Keywords
Field
DocType
bayesian classifier,robust speech recognition,unknown background noise,frequency-dependent classification,speech recognition accuracy,masking noise,multi-band partition method,temporal variation,independent classification,mask classification,colored-noise masker generation,missing-feature reconstruction,incoming speech,mask classifier,white noise,additive noise,frequency-dependent mask classification,speech recognition,bayesian approach,signal to noise ratio,colored noise,time frequency
Background noise,Colors of noise,Pattern recognition,Naive Bayes classifier,Computer science,A priori and a posteriori,White noise,Speech recognition,Artificial intelligence,Missing data,Classifier (linguistics),Bayesian probability
Journal
Volume
Issue
ISSN
53
1
Speech Communication
Citations 
PageRank 
References 
9
0.64
15
Authors
2
Name
Order
Citations
PageRank
Wooil Kim112016.95
Richard M. Stern21663406.79