A Novel Mask Estimation Method Employing Posterior-Based Representative Mean Estimate for Missing-Feature Speech Recognition - Citegraph

Paper Info

Title
A Novel Mask Estimation Method Employing Posterior-Based Representative Mean Estimate for Missing-Feature Speech Recognition

Abstract
This paper proposes a novel mask estimation method for missing-feature reconstruction to improve speech recognition performance in various types of background noise conditions. A conventional mask estimation method based on spectral subtraction degrades performance, due to incorrect estimation of the noise signal which fails to accurately represent the variations of background noise during the incoming speech utterance. The proposed mask estimation method utilizes a Posterior-based Representative Mean (PRM) estimate for determining the reliability of the input speech spectral components, which is obtained as a weighted sum of the mean parameters of the speech model using the posterior probability. To obtain the noise-corrupted speech model, a model combination method is employed, which was proposed in our previous study for a feature compensation method. Experimental results demonstrate that the proposed mask estimation method provides more separable distributions for the reliable/unreliable component classifier compared to the conventional mask estimation method. The recognition performance is evaluated using the Aurora 2.0 framework over various types of background noise conditions and the CU-Move real-life in-vehicle corpus. The performance evaluation shows that the proposed mask estimation method is considerably more effective at increasing speech recognition performance in various types of background noise conditions, compared to the conventional mask estimation method which is based on spectral subtraction. By employing the proposed PRM-based mask estimation for missing-feature reconstruction, we obtain +23.41% and +9.45% average relative improvements in word error rate for all four types of noise conditions and CU-Move corpus, respectively, compared to conventional mask estimation methods.

Year	DOI	Venue
2011	10.1109/TASL.2010.2091633	IEEE Transactions on Audio, Speech & Language Processing
Keywords	Field	DocType
aurora 2.0 framework,novel mask estimation method,speech recognition,speech intelligibility,speech recognition performance,employing posterior-based representative mean,speech utterance,posterior probability,prm,posterior-based representative mean estimate,spectral subtraction,various type,estimation theory,proposed prm-based mask estimation,missing-feature,prm-based mask estimation method,proposed mask estimation method,conventional mask estimation method,background noise,posterior-based representative mean (prm) estimate,speech spectral component,incorrect estimation,missing-feature reconstruction,robust speech recognition,background noise condition,feature compensation method,mask estimation,noise-corrupted speech model,missing-feature speech recognition,model combination method,probability,word error rate	Speech processing,Background noise,Noise (signal processing),Pattern recognition,Computer science,Word error rate,Posterior probability,Error detection and correction,Speech recognition,Artificial intelligence,Estimation theory,Intelligibility (communication)	Journal
Volume	Issue	ISSN
19	5	1558-7916
Citations	PageRank	References
6	0.51	25
Authors
2

Authors (2 rows)

Cited by (6 rows)

References (25 rows)

Name	Order	Citations	PageRank
Wooil Kim	1	120	16.95
John H. L. Hansen	2	3215	365.75

1