Title
Multivariate-Gaussian-Based Cepstral Normalization For Robust Speech Recognition
Abstract
In this paper we introduce a new family of environmental com- pensation algorithms called Multivariate Gaussian Based Cepstral Normalization (RATZ). RATZ assumes that the effects of unknown noise and filtering on speech features can be compen- sated by corrections to the mean and variance of components of Gaussian mixtures, and an efficient procedure for estimating the correction factors is provided. The RATZ algorithm can be imple- mented to work with or without the use of "stereo" development data that had been simultaneously recorded in the training and testing environments. "Blind" RATZ partially overcomes the loss of information that would have been provided by stereo training through the use of a more accurate description of how noisy envi- ronments affect clean speech. We evaluate the performance of the two RATZ algorithms using the CMU SPHINX-II system on the alphanumeric census database and compare their performance with that of previous environmental-robustness developed at CMU. the proposed methods perform compensation based on empirical comparisons, like MFCDCN, but using the more formal represen- tation of probability densities and the optimal estimation proce- dures that were used in previous model based procedures like CDCN. Nevertheless, there is no explicit model for environmental degradation (unlike the model based approaches). We only assume that the environment modifies some of the parameters used to describe the feature distributions of clean speech. Our new techniques can exploit the information provided by ste- reo data if available. However, stereo databases are not always easy to collect. We will demonstrate that the representational structure of the algorithms permit nearly-optimal compensation, even in the absence of stereo data. In Sec. 2 we describe the effects of environmental degradation on the probability density functions (pdfs) of the feature vectors used for recognition. The new algorithms are described in Sec. 3, and they are evaluated using simulated and real speech data in Sec. 4.
Year
DOI
Venue
1995
10.1109/ICASSP.1995.479292
1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING - CONFERENCE PROCEEDINGS, VOLS 1-5
Keywords
Field
DocType
variance,mean,gaussian processes,speech recognition,noise,robustness,databases,degradation,filtering,statistics
Alphanumeric,Normalization (statistics),Pattern recognition,Computer science,Cepstrum,Filter (signal processing),Speech recognition,Gaussian,Multivariate normal distribution,Artificial intelligence,Gaussian process,Filtering theory
Conference
Citations 
PageRank 
References 
17
10.04
1
Authors
4
Name
Order
Citations
PageRank
Pedro J. Moreno11256114.37
Raj, Bhiksha22094204.63
Evandro Gouvêa32010.47
Richard M. Stern41663406.79