Optimal Completion Distillation for Sequence Learning. - Citegraph

Paper Info

Title
Optimal Completion Distillation for Sequence Learning.

Abstract
We present Optimal Completion Distillation (OCD), a training procedure for optimizing sequence to sequence models based on edit distance. OCD is efficient, has no hyper-parameters of its own, and does not require pretraining or joint optimization with conditional log-likelihood. Given a partial sequence generated by the model, we first identify the set of optimal suffixes that minimize the total edit distance, using an efficient dynamic programming algorithm. Then, for each position of the generated sequence, we use a target distribution that puts equal probability on the first token of all the optimal suffixes. OCD achieves the state-of-the-art performance on end-to-end speech recognition, on both Wall Street Journal and Librispeech datasets, achieving $9.3%$ WER and $4.5%$ WER respectively.

Year	Venue	Field
2018	international conference on learning representations	Edit distance,Dynamic programming,Arithmetic,Distillation,Artificial intelligence,Security token,Sequence learning,Machine learning,Mathematics
DocType	Volume	Citations
Journal	abs/1810.01398	2
PageRank	References	Authors
0.38	40	3

Authors (3 rows)

Cited by (2 rows)

References (40 rows)

Name	Order	Citations	PageRank
S. Sabour	1	92	6.55
William Chan	2	357	24.67
Mohammad Norouzi	3	1212	56.60

1