Run-Time Efficient RNN Compression for Inference on Edge Devices - Citegraph

Paper Info

Title
Run-Time Efficient RNN Compression for Inference on Edge Devices

Abstract
Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints. As a result, there is a need for compression techniques that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper explores a new compressed RNN cell implementation called Hybrid Matrix Decomposition (HMD) that achieves this dual objective. HMD creates dense matrices that results in output features where the upper sub-vector has "richer" features while the lower sub vector has "constrained" features". On the benchmarks evaluated in this paper, this results in faster inference runtime than pruning and better accuracy than matrix factorization for compression factors of 2-4x.

Year	DOI	Venue
2019	10.1109/EMC249363.2019.00013	2019 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)
Keywords	DocType	Volume
Matrix factorization,pruning,model compression,NN inference	Journal	abs/1906.04886
ISBN	Citations	PageRank
978-1-7281-6764-0	1	0.36
References	Authors
3	5

Authors (5 rows)

Cited by (1 rows)

References (3 rows)

Name	Order	Citations	PageRank
Urmish Thakker	1	1	3.74
Jesse G. Beu	2	2	3.41
Dibakar Gope	3	1	0.70
Ganesh S. Dasika	4	387	24.30
Matthew Mattina	5	441	28.63

1