Abstract | ||
---|---|---|
Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints. As a result, there is a need for compression techniques that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper explores a new compressed RNN cell implementation called Hybrid Matrix Decomposition (HMD) that achieves this dual objective. HMD creates dense matrices that results in output features where the upper sub-vector has "richer" features while the lower sub vector has "constrained" features". On the benchmarks evaluated in this paper, this results in faster inference runtime than pruning and better accuracy than matrix factorization for compression factors of 2-4x. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/EMC249363.2019.00013 | 2019 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2) |
Keywords | DocType | Volume |
Matrix factorization,pruning,model compression,NN inference | Journal | abs/1906.04886 |
ISBN | Citations | PageRank |
978-1-7281-6764-0 | 1 | 0.36 |
References | Authors | |
3 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Urmish Thakker | 1 | 1 | 3.74 |
Jesse G. Beu | 2 | 2 | 3.41 |
Dibakar Gope | 3 | 1 | 0.70 |
Ganesh S. Dasika | 4 | 387 | 24.30 |
Matthew Mattina | 5 | 441 | 28.63 |