Distilling Knowledge From An Ensemble Of Models For Punctuation Prediction - Citegraph

Paper Info

Title
Distilling Knowledge From An Ensemble Of Models For Punctuation Prediction

Abstract
This paper proposes an approach to distill knowledge from an ensemble of models to a single deep neural network (DNN) student model for punctuation prediction. This approach makes the DNN student model mimic the behavior of the ensemble. The ensemble consists of three single models. Kullback-Leibler (KL) divergence is used to minimize the difference between the output distribution of the DNN student model and the behavior of the ensemble. Experimental results on English IWSLT2011 dataset show that the ensemble outperforms the previous state-of-the-art model by up to 4.0% absolute in overall F-I-score. The DNN student model also achieves up to 13.4% absolute overall F-I-score improvement over the conventionally-trained baseline models.

Year	DOI	Venue
2017	10.21437/Interspeech.2017-1079	18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION
Keywords	Field	DocType
transfer learning, knowledge distillation, ensemble, neural network, punctuation prediction	Pattern recognition,Computer science,Speech recognition,Natural language processing,Artificial intelligence,Punctuation	Conference
ISSN	Citations	PageRank
2308-457X	0	0.34
References	Authors
12	4

Authors (4 rows)

Cited by (0 rows)

References (12 rows)

Name	Order	Citations	PageRank
Jiangyan Yi	1	19	17.99
Jianhua Tao	2	848	138.00
Zhengqi Wen	3	86	24.41
Ya Li	4	36	11.21

1