Speech Enhancement Using a Risk Estimation Approach - Citegraph

Paper Info

Title
Speech Enhancement Using a Risk Estimation Approach

Abstract
•In this paper, we develop a risk estimation framework for speech enhancement, where we optimize an unbiased estimate of the risk instead of the actual risk. The estimated risk is expressed solely as a function of the noisy observations and noise statistics. Hence, the denoiser obtained by minimizing the risk estimate does not require the clean speech prior. The stateof- the-art image denoising techniques optimize Steins unbiased risk estimate (SURE), which is an unbiased estimate of MSE, to obtain the optimum denoising function. Even though the MSE is a widely and successfully used distortion measure for signal denoising, in speech processing applications, distortion measures such as Itakura-Saito (IS), hyperbolic-cosine (cosh), weighted cosh, are known to be more perceptually relevant than MSE. Considering this into account, in this paper, we solve the speech denoising problem within the framework of perceptual risk estimation (wherein we derive unbiased estimates of speech-specific perceptual distortion measures and minimize them to obtain the corresponding denoising functions). We employ a DCT-domain pointwise shrinkage estimator for denoising where the optimum shrinkage estimator is obtained by minimizing the perceptual risk estimate. We evaluate the performance of the risk estimation-based techniques objective assessment in terms of segmental signal-to-noise ratio (SSNR), perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), signal-to-distortion ratio (SDR), and subjective assessment by means of listening tests. Validation on several speech signals in real-world nonstationary noise scenarios and comparisons with benchmark techniques showed that, for input SNR greater than 5 dB, the proposed method results in better denoising performance than several benchmarking techniques. Among the risk estimation-based techniques, the quality of the denoised speech is higher (measured in terms of PESQ and subjective listening scores) for perceptual risk-based techniques than the MSE-based technique. Further, we want to emphasize that the proposed methodology is relatively simpler, from an implementation perspective since the shrinkage estimators are easy to compute, and does not require training making it ideal for deployment in practical applications, particularly for those involving hearing aids, mobile devices, etc.

Year	DOI	Venue
2020	10.1016/j.specom.2019.11.001	Speech Communication
Keywords	Field	DocType
Speech enhancement,Unbiased risk estimation,Stein’s lemma,Perceptual distortion measure,Objective and subjective assessment	Wiener filter,Speech enhancement,Computer science,A priori and a posteriori,Speech recognition,Non-negative matrix factorization,Prior probability,Distortion,Intelligibility (communication),PESQ	Journal
Volume	ISSN	Citations
116	0167-6393	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Jishnu Sadasivan	1	0	1.01
Chandra Sekhar Seelamantula	2	142	37.43
Nagarjuna Reddy Muraka	3	3	0.84

1