Title
Run-Time Efficient RNN Compression for Inference on Edge Devices
Abstract
Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints. As a result, there is a need for compression techniques that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper explores a new compressed RNN cell implementation called Hybrid Matrix Decomposition (HMD) that achieves this dual objective. HMD creates dense matrices that results in output features where the upper sub-vector has "richer" features while the lower sub vector has "constrained" features". On the benchmarks evaluated in this paper, this results in faster inference runtime than pruning and better accuracy than matrix factorization for compression factors of 2-4x.
Year
DOI
Venue
2019
10.1109/EMC249363.2019.00013
2019 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)
Keywords
DocType
Volume
Matrix factorization,pruning,model compression,NN inference
Journal
abs/1906.04886
ISBN
Citations 
PageRank 
978-1-7281-6764-0
1
0.36
References 
Authors
3
5
Name
Order
Citations
PageRank
Urmish Thakker113.74
Jesse G. Beu223.41
Dibakar Gope310.70
Ganesh S. Dasika438724.30
Matthew Mattina544128.63