Abstract | ||
---|---|---|
Speaker variability is known to have an adverse impact on speech systems that process linguistic content, such as speech and language recognition. However, speech production changes in individuals due to stress and emotions have similarly detrimental effect also on the task of speaker recognition as they introduce mismatch with the speaker models typically trained on modal speech. The focus of this study is on the analysis of stress-induced variations in speech and design of an automatic stress level assessment scheme that could be used in directing stress-dependent acoustic models or normalization strategies. Current stress detection methods typically employ a binary decision based on whether the speaker is or not under stress. In reality, the amount of stress in individuals varies and can change gradually. Using speech and biometric data collected in a real-world, variable-stress level law enforcement training scenario, this study considers two methods for stress level assessment. The first approach uses a nearest neighbor clustering scheme at the vowel token and sentence levels to classify speech data into three levels of stress. The second approach employs Euclidean distance metrics within the multi-dimensional feature space to provide real-time stress level tracking capability. Evaluations on audio data confirmed by biometric readings show both methods to be effective in assessment of stress level within a speaker (average accuracy of 55.6 % in a 3-way classification task). In addition, an impact of high-level stress on in-set speaker recognition is evaluated and shown to reduce the accuracy from 91.7 % (low/mid stress) to 21.4 % (high level stress). |
Year | DOI | Venue |
---|---|---|
2012 | 10.1007/s10772-012-9165-1 | I. J. Speech Technology |
Keywords | Field | DocType |
Stress assessment from speech, FLETC Corpus, TEO operator | Feature vector,Normalization (statistics),Pattern recognition,Computer science,Euclidean distance,Speech recognition,Speaker recognition,Speaker diarisation,Artificial intelligence,Biometrics,Cluster analysis,Speech production | Journal |
Volume | Issue | ISSN |
15 | 3 | 1381-2416 |
Citations | PageRank | References |
3 | 0.38 | 13 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
John H. L. Hansen | 1 | 3215 | 365.75 |
Evan Ruzanski | 2 | 21 | 3.89 |
Hynek Boril | 3 | 4 | 1.06 |
James Meyerhoff | 4 | 38 | 3.88 |