Improving Deliberation by Text-Only and Semi-Supervised Training - Citegraph

Paper Info

Title
Improving Deliberation by Text-Only and Semi-Supervised Training

Abstract
Text-only and semi-supervised training based on audio-only data has gained popularity recently due to the wide availability of unlabeled text and speech data. In this work, we propose incorporating text-only and semi-supervised training into an attention-based deliberation model. By incorporating text-only data in training a bidirectional encoder representation from transformer (BERT) for the deliberation text encoder, and large-scale text-to-speech and audio-only utterances using joint acoustic and text decoder (JATD) and semi-supervised training, we achieved 4%-12% WER reduction for various tasks compared to the baseline deliberation. Compared to a state-of-the-art language model (LM) rescoring method, the deliberation model reduces the Google Voice Search WER by 11% relative. We show that the deliberation model also achieves a positive human side-by-side evaluation compared to the state-of-the-art LM rescorer with reasonable endpointer latencies.

Year	DOI	Venue
2022	10.21437/INTERSPEECH.2022-243	Conference of the International Speech Communication Association (INTERSPEECH)
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Ke Hu	1	1	1.73
Tara N. Sainath	2	3497	232.43
Yanzhang He	3	64	16.36
Rohit Prabhavalkar	4	163	22.56
Trevor Strohman	5	0	2.70
Sepand Mavandadi	6	0	1.35
Weiran Wang	7	114	9.99

1