Rescoring Teacher Outputs with Decoded Utterances for Knowledge Distillation in Automatic Speech Recognition - Citegraph

Paper Info

Title
Rescoring Teacher Outputs with Decoded Utterances for Knowledge Distillation in Automatic Speech Recognition

Abstract
Automatic Speech Recognition requires large amounts of training data to achieve good results. As this hand-labeling is both slow and expensive, the utilization of untranscribed speech has been explored for the task. This generally involves training a teacher-model on the available transcribed speech, and training a student model either directly on the latent representations of the teacher, or on the decoded output. We propose to combine these two approaches, and rescore the predictions of the teacher based on the decoded output. The probability of a decoded sentence, and how it corresponds with the probability distribution output of the teacher, affects the rescoring. When training our student model and evaluating using our proposed method, we find it gives up to 8.6% relative improvement in character error rate, and 5.4% relative improvement in word error rate over our strongest baseline.

Year	DOI	Venue
2020	10.1109/SCISISIS50064.2020.9322742	2020 Joint 11th International Conference on Soft Computing and Intelligent Systems and 21st International Symposium on Advanced Intelligent Systems (SCIS-ISIS)
Keywords	DocType	ISSN
decoded utterances,knowledge distillation,Automatic Speech Recognition,hand-labeling,untranscribed speech,teacher-model,student model,decoded sentence,probability distribution output,word error rate	Conference	2377-6870
ISBN	Citations	PageRank
978-1-7281-9733-3	0	0.34
References	Authors
0	2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Henning M. Holen	1	0	0.34
Jee-Hyong Lee	2	0	0.34

1