Abstract | ||
---|---|---|
In this paper we introduce a new family of environmental com- pensation algorithms called Multivariate Gaussian Based Cepstral Normalization (RATZ). RATZ assumes that the effects of unknown noise and filtering on speech features can be compen- sated by corrections to the mean and variance of components of Gaussian mixtures, and an efficient procedure for estimating the correction factors is provided. The RATZ algorithm can be imple- mented to work with or without the use of "stereo" development data that had been simultaneously recorded in the training and testing environments. "Blind" RATZ partially overcomes the loss of information that would have been provided by stereo training through the use of a more accurate description of how noisy envi- ronments affect clean speech. We evaluate the performance of the two RATZ algorithms using the CMU SPHINX-II system on the alphanumeric census database and compare their performance with that of previous environmental-robustness developed at CMU. the proposed methods perform compensation based on empirical comparisons, like MFCDCN, but using the more formal represen- tation of probability densities and the optimal estimation proce- dures that were used in previous model based procedures like CDCN. Nevertheless, there is no explicit model for environmental degradation (unlike the model based approaches). We only assume that the environment modifies some of the parameters used to describe the feature distributions of clean speech. Our new techniques can exploit the information provided by ste- reo data if available. However, stereo databases are not always easy to collect. We will demonstrate that the representational structure of the algorithms permit nearly-optimal compensation, even in the absence of stereo data. In Sec. 2 we describe the effects of environmental degradation on the probability density functions (pdfs) of the feature vectors used for recognition. The new algorithms are described in Sec. 3, and they are evaluated using simulated and real speech data in Sec. 4. |
Year | DOI | Venue |
---|---|---|
1995 | 10.1109/ICASSP.1995.479292 | 1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING - CONFERENCE PROCEEDINGS, VOLS 1-5 |
Keywords | Field | DocType |
variance,mean,gaussian processes,speech recognition,noise,robustness,databases,degradation,filtering,statistics | Alphanumeric,Normalization (statistics),Pattern recognition,Computer science,Cepstrum,Filter (signal processing),Speech recognition,Gaussian,Multivariate normal distribution,Artificial intelligence,Gaussian process,Filtering theory | Conference |
Citations | PageRank | References |
17 | 10.04 | 1 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Pedro J. Moreno | 1 | 1256 | 114.37 |
Raj, Bhiksha | 2 | 2094 | 204.63 |
Evandro Gouvêa | 3 | 20 | 10.47 |
Richard M. Stern | 4 | 1663 | 406.79 |