Title | ||
---|---|---|
Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction. |
Abstract | ||
---|---|---|
Confidence scores are very useful for downstream applications of automatic speech recognition (ASR) systems. Recent works have proposed using neural networks to learn word or utterance confidence scores for end-to-end ASR. In those studies, word confidence by itself does not model deletions, and utterance confidence does not take advantage of word-level training signals. This paper proposes to jointly learn word confidence, word deletion, and utterance confidence. Empirical results show that multi-task learning with all three objectives improves confidence metrics (NCE, AUC, RMSE) without the need for increasing the model size of the confidence estimation module. Using the utterance-level confidence for rescoring also decreases the word error rates on Google's Voice Search and Long-tail Maps datasets by 3-5% relative, without needing a dedicated neural rescorer. |
Year | DOI | Venue |
---|---|---|
2021 | 10.21437/Interspeech.2021-1207 | Interspeech |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
David Qiu | 1 | 0 | 1.35 |
Yanzhang He | 2 | 64 | 16.36 |
Li, Qiujia | 3 | 5 | 4.48 |
Yu Zhang | 4 | 442 | 41.79 |
liangliang cao | 5 | 1816 | 90.71 |
Ian McGraw | 6 | 253 | 24.41 |