Abstract | ||
---|---|---|
This paper proposes an approach to distill knowledge from an ensemble of models to a single deep neural network (DNN) student model for punctuation prediction. This approach makes the DNN student model mimic the behavior of the ensemble. The ensemble consists of three single models. Kullback-Leibler (KL) divergence is used to minimize the difference between the output distribution of the DNN student model and the behavior of the ensemble. Experimental results on English IWSLT2011 dataset show that the ensemble outperforms the previous state-of-the-art model by up to 4.0% absolute in overall F-I-score. The DNN student model also achieves up to 13.4% absolute overall F-I-score improvement over the conventionally-trained baseline models. |
Year | DOI | Venue |
---|---|---|
2017 | 10.21437/Interspeech.2017-1079 | 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION |
Keywords | Field | DocType |
transfer learning, knowledge distillation, ensemble, neural network, punctuation prediction | Pattern recognition,Computer science,Speech recognition,Natural language processing,Artificial intelligence,Punctuation | Conference |
ISSN | Citations | PageRank |
2308-457X | 0 | 0.34 |
References | Authors | |
12 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jiangyan Yi | 1 | 19 | 17.99 |
Jianhua Tao | 2 | 848 | 138.00 |
Zhengqi Wen | 3 | 86 | 24.41 |
Ya Li | 4 | 36 | 11.21 |