Title
Speech Intelligibility Prediction Using Spectro-Temporal Modulation Analysis
Abstract
AbstractSpectro-temporal modulations are believed to mediate the analysis of speech sounds in the human primary auditory cortex. Inspired by humans’ robustness in comprehending speech in challenging acoustic environments, we propose an intrusive speech intelligibility prediction (SIP) algorithm, wSTMI, for normal-hearing listeners based on spectro-temporal modulation analysis (STMA) of the clean and degraded speech signals. In the STMA, each of 55 modulation frequency channels contributes an intermediate intelligibility measure. A sparse linear model with parameters optimized using Lasso regression results in combining the intermediate measures of 8 of the most salient channels for SIP. In comparison with a suite of 10 SIP algorithms, wSTMI performs consistently well across 13 datasets, which together cover degradation conditions including modulated noise, noise reduction processing, reverberation, near-end listening enhancement, and speech interruption. We show that the optimized parameters of wSTMI may be interpreted in terms of modulation transfer functions of the human auditory system. Thus, the proposed approach offers evidence affirming previous studies of the perceptual characteristics underlying speech signal intelligibility.
Year
DOI
Venue
2021
10.1109/TASLP.2020.3039929
IEEE/ACM Transactions on Audio, Speech and Language Processing
Keywords
DocType
Volume
Modulation, Speech processing, Frequency modulation, Degradation, Indexes, Spectrogram, Prediction algorithms, Spectro-temporal modulation, speech intelligibility, speech quality model
Journal
29
Issue
ISSN
Citations 
1
2329-9290
0
PageRank 
References 
Authors
0.34
15
4
Name
Order
Citations
PageRank
Amin Edraki100.34
W.-Y. Chan211418.25
Jesper Jensen31548133.47
Daniel Fogerty400.34