LEARNING WORD-LEVEL CONFIDENCE FOR SUBWORD END-TO-END ASR - Citegraph

Paper Info

Title
LEARNING WORD-LEVEL CONFIDENCE FOR SUBWORD END-TO-END ASR

Abstract
We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR). Although prior works have proposed training auxiliary confidence models for ASR systems, they do not extend naturally to systems that operate on word-pieces (WP) as their vocabulary. In particular, ground truth WP correctness labels are needed for training confidence models, but the non-unique tokenization from word to WP causes inaccurate labels to be generated. This paper proposes and studies two confidence models of increasing complexity to solve this problem. The final model uses self-attention to directly learn word-level confidence without needing subword tokenization, and exploits full context features from multiple hypotheses to improve confidence accuracy. Experiments on Voice Search and long-tail test sets show standard metrics (e.g., NCE, AUC, RMSE) improving substantially. The proposed confidence module also enables a model selection approach to combine an on-device E2E model with a hybrid model on the server to address the rare word recognition problem for the E2E model.

Year	DOI	Venue
2021	10.1109/ICASSP39728.2021.9413966	2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021)
Keywords	DocType	Citations
Automatic speech recognition, confidence, calibration, transformer, attention-based end-to-end models	Conference	0
PageRank	References	Authors
0.34	0	12

Authors (12 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
David Qiu	1	0	0.68
Li, Qiujia	2	5	4.48
Yanzhang He	3	64	16.36
Yu Zhang	4	442	41.79
Bo Li	5	206	42.46
liangliang cao	6	1816	90.71
Rohit Prabhavalkar	7	163	22.56
Deepti Bhatia	8	0	0.34
Wei Li	9	436	140.67
Ke Hu	10	1	1.73
Tara N. Sainath	11	3497	232.43
Ian McGraw	12	253	24.41

1