Ensemble Distillation for Neural Machine Translation. - Citegraph

Paper Info

Title
Ensemble Distillation for Neural Machine Translation.

Abstract
Knowledge distillation describes a method for training a student network to perform better by learning from a stronger teacher network. In this work, we run experiments with different kinds of teacher net- works to enhance the translation performance of a student Neural Machine Translation (NMT) network. We demonstrate techniques based on an ensemble and a best BLEU teacher network. We also show how to benefit from a teacher network that has the same architecture and dimensions of the student network. Further- more, we introduce a data filtering technique based on the dissimilarity between the forward translation (obtained during knowledge distillation) of a given source sentence and its target reference. We use TER to measure dissimilarity. Finally, we show that an ensemble teacher model can significantly reduce the student model size while still getting performance improvements compared to the baseline student network.

Year	Venue	Field
2017	arXiv: Computation and Language	Data filtering,Computer science,Machine translation,Oracle,Artificial intelligence,Natural language processing,Speedup,Architecture,Speech recognition,Distillation,Decoding methods,Sentence,Machine learning
DocType	Volume	Citations
Journal	abs/1702.01802	3
PageRank	References	Authors
0.40	5	3

Authors (3 rows)

Cited by (3 rows)

References (5 rows)

Name	Order	Citations	PageRank
Markus Freitag	1	86	15.28
Yaser Al-Onaizan	2	540	38.51
Baskaran Sankaran	3	155	13.65

1