Title | ||
---|---|---|
Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition |
Abstract | ||
---|---|---|
Integrating an external language model into a sequence-to-sequence speech recognition system is non-trivial. Previous works utilize linear interpolation or a fusion network to integrate external language models. However, these approaches introduce external components, and increase decoding computation. In this paper, we instead propose a knowledge distillation based training approach to integrating external language models into a sequence-to-sequence model. A recurrent neural network language model, which is trained on large scale external text, generates soft labels to guide the sequence-to-sequence model training. Thus, the language model plays the role of the teacher. This approach does not add any external component to the sequence-to-sequence model during testing. And this approach is flexible to be combined with shallow fusion technique together for decoding. The experiments are conducted on public Chinese datasets AISHELL-1 and CLMAD. Our approach achieves a character error rate of 9.3%, which is relatively reduced by 18.42% compared with the vanilla sequence-to-sequence model. |
Year | DOI | Venue |
---|---|---|
2019 | 10.21437/Interspeech.2019-1554 | INTERSPEECH |
DocType | Citations | PageRank |
Conference | 1 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ye Bai | 1 | 7 | 5.52 |
Jiangyan Yi | 2 | 19 | 17.99 |
Jianhua Tao | 3 | 848 | 138.00 |
Zhengkun Tian | 4 | 3 | 5.79 |
Zhengqi Wen | 5 | 86 | 24.41 |