Title
Speech Enhancement Using a Risk Estimation Approach
Abstract
•In this paper, we develop a risk estimation framework for speech enhancement, where we optimize an unbiased estimate of the risk instead of the actual risk. The estimated risk is expressed solely as a function of the noisy observations and noise statistics. Hence, the denoiser obtained by minimizing the risk estimate does not require the clean speech prior. The stateof- the-art image denoising techniques optimize Steins unbiased risk estimate (SURE), which is an unbiased estimate of MSE, to obtain the optimum denoising function. Even though the MSE is a widely and successfully used distortion measure for signal denoising, in speech processing applications, distortion measures such as Itakura-Saito (IS), hyperbolic-cosine (cosh), weighted cosh, are known to be more perceptually relevant than MSE. Considering this into account, in this paper, we solve the speech denoising problem within the framework of perceptual risk estimation (wherein we derive unbiased estimates of speech-specific perceptual distortion measures and minimize them to obtain the corresponding denoising functions). We employ a DCT-domain pointwise shrinkage estimator for denoising where the optimum shrinkage estimator is obtained by minimizing the perceptual risk estimate. We evaluate the performance of the risk estimation-based techniques objective assessment in terms of segmental signal-to-noise ratio (SSNR), perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), signal-to-distortion ratio (SDR), and subjective assessment by means of listening tests. Validation on several speech signals in real-world nonstationary noise scenarios and comparisons with benchmark techniques showed that, for input SNR greater than 5 dB, the proposed method results in better denoising performance than several benchmarking techniques. Among the risk estimation-based techniques, the quality of the denoised speech is higher (measured in terms of PESQ and subjective listening scores) for perceptual risk-based techniques than the MSE-based technique. Further, we want to emphasize that the proposed methodology is relatively simpler, from an implementation perspective since the shrinkage estimators are easy to compute, and does not require training making it ideal for deployment in practical applications, particularly for those involving hearing aids, mobile devices, etc.
Year
DOI
Venue
2020
10.1016/j.specom.2019.11.001
Speech Communication
Keywords
Field
DocType
Speech enhancement,Unbiased risk estimation,Stein’s lemma,Perceptual distortion measure,Objective and subjective assessment
Wiener filter,Speech enhancement,Computer science,A priori and a posteriori,Speech recognition,Non-negative matrix factorization,Prior probability,Distortion,Intelligibility (communication),PESQ
Journal
Volume
ISSN
Citations 
116
0167-6393
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Jishnu Sadasivan101.01
Chandra Sekhar Seelamantula214237.43
Nagarjuna Reddy Muraka330.84