Title
Distilling Knowledge From An Ensemble Of Models For Punctuation Prediction
Abstract
This paper proposes an approach to distill knowledge from an ensemble of models to a single deep neural network (DNN) student model for punctuation prediction. This approach makes the DNN student model mimic the behavior of the ensemble. The ensemble consists of three single models. Kullback-Leibler (KL) divergence is used to minimize the difference between the output distribution of the DNN student model and the behavior of the ensemble. Experimental results on English IWSLT2011 dataset show that the ensemble outperforms the previous state-of-the-art model by up to 4.0% absolute in overall F-I-score. The DNN student model also achieves up to 13.4% absolute overall F-I-score improvement over the conventionally-trained baseline models.
Year
DOI
Venue
2017
10.21437/Interspeech.2017-1079
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION
Keywords
Field
DocType
transfer learning, knowledge distillation, ensemble, neural network, punctuation prediction
Pattern recognition,Computer science,Speech recognition,Natural language processing,Artificial intelligence,Punctuation
Conference
ISSN
Citations 
PageRank 
2308-457X
0
0.34
References 
Authors
12
4
Name
Order
Citations
PageRank
Jiangyan Yi11917.99
Jianhua Tao2848138.00
Zhengqi Wen38624.41
Ya Li43611.21