Abstract | ||
---|---|---|
Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work. Current speech recognition systems are trained primarily with data from fluent speakers and as a consequence do not generalize well to speech with dysfluencies such as sound or word repetitions, sound prolongations, or audible blocks. The focus of this work is on quantitative analysis of a consumer speech recognition system on individuals who stutter and production-oriented approaches for improving performance for common voice assistant tasks (i.e., "what is the weather?"). At baseline, this system introduces a significant number of insertion and substitution errors resulting in intended speech Word Error Rates (isWER) that are 13.64\% worse (absolute) for individuals with fluency disorders. We show that by simply tuning the decoding parameters in an existing hybrid speech recognition system one can improve isWER by 24\% (relative) for individuals with fluency disorders. Tuning these parameters translates to 3.6\% better domain recognition and 1.7\% better intent recognition relative to the default setup for the 18 study participants across all stuttering severities. |
Year | DOI | Venue |
---|---|---|
2021 | 10.21437/Interspeech.2021-2006 | Interspeech |
DocType | Citations | PageRank |
Conference | 1 | 0.40 |
References | Authors | |
0 | 11 |
Name | Order | Citations | PageRank |
---|---|---|---|
Vikramjit Mitra | 1 | 299 | 24.83 |
Zifang Huang | 2 | 1 | 0.73 |
Colin S. Lea | 3 | 3 | 2.13 |
Lauren Tooley | 4 | 1 | 0.40 |
Sarah Wu | 5 | 2 | 0.80 |
Darren Botten | 6 | 1 | 0.40 |
Ashwini Palekar | 7 | 1 | 0.40 |
Shrinath Thelapurath | 8 | 1 | 0.40 |
Panayiotis Georgiou | 9 | 1 | 0.40 |
Sachin Kajarekar | 10 | 1 | 2.09 |
jeffrey p bigham | 11 | 2647 | 189.29 |