Title
Towards Compact and Fast Neural Machine Translation Using a Combined Method.
Abstract
Neural Machine Translation (NMT) lays intensive burden on computation and cost. It is challenge to deploy NMT models on the devices with limited computation and memory budgets. This paper presents four stage pipeline to compress model and speed up the decoding for NMT. Our method first introduces a compact architecture based on convolutional encoder and weight shared embeddings. Then weight pruning is applied to obtain sparse model. Next, we propose fast sequence interpolation approach which enables the decoding to achieve performance on par with the beam search. Hence, the time-consuming beam search can be replaced by simple greedy decoding. Finally, vocabulary selection is used to reduce the computation of softmax layer. Our final model achieves 10 × speedup, 17 × parameters reduction, 35MB storage size and comparable performance compared to the baseline model.
Year
DOI
Venue
2017
10.18653/v1/d17-1154
EMNLP
Field
DocType
Volume
Softmax function,Computer science,Interpolation,Beam search,Algorithm,Artificial intelligence,Encoder,Decoding methods,Artificial neural network,Machine learning,Speedup,Computation
Conference
D17-1
Citations 
PageRank 
References 
3
0.38
14
Authors
5
Name
Order
Citations
PageRank
Xiaowei Zhang1124.06
Wei Chen292.86
Feng Wang3273.51
Shuang Xu442.76
Bo Xu524136.59