Title
Rescoring Teacher Outputs with Decoded Utterances for Knowledge Distillation in Automatic Speech Recognition
Abstract
Automatic Speech Recognition requires large amounts of training data to achieve good results. As this hand-labeling is both slow and expensive, the utilization of untranscribed speech has been explored for the task. This generally involves training a teacher-model on the available transcribed speech, and training a student model either directly on the latent representations of the teacher, or on the decoded output. We propose to combine these two approaches, and rescore the predictions of the teacher based on the decoded output. The probability of a decoded sentence, and how it corresponds with the probability distribution output of the teacher, affects the rescoring. When training our student model and evaluating using our proposed method, we find it gives up to 8.6% relative improvement in character error rate, and 5.4% relative improvement in word error rate over our strongest baseline.
Year
DOI
Venue
2020
10.1109/SCISISIS50064.2020.9322742
2020 Joint 11th International Conference on Soft Computing and Intelligent Systems and 21st International Symposium on Advanced Intelligent Systems (SCIS-ISIS)
Keywords
DocType
ISSN
decoded utterances,knowledge distillation,Automatic Speech Recognition,hand-labeling,untranscribed speech,teacher-model,student model,decoded sentence,probability distribution output,word error rate
Conference
2377-6870
ISBN
Citations 
PageRank 
978-1-7281-9733-3
0
0.34
References 
Authors
0
2
Name
Order
Citations
PageRank
Henning M. Holen100.34
Jee-Hyong Lee200.34