Abstract | ||
---|---|---|
Despite their outstanding success in solving complex computer vision problems, Deep Neural Networks (DNNs) still require high-performance hardware for real-time inference. Therefore they are not applicable to low-cost embedded hardware, where memory resources, computational performance and power consumption are restricted. Furthermore, current approaches of fitting neural networks to embedded hardware are time consuming, inducing slow development cycles. To address these drawbacks and satisfy the demands of embedded hardware, this paper proposes a computationally efficient magnitude-based pruning scheme, based on a half-interval search, combined with effective weight sharing, fixed-point quantization, and lossless compression. The proposed solution can be utilized to generate an optimized model, either with respect to memory demand or execution time. For instance, the memory demand of LeNet is compressed about 385×. VGG16 is pruned by about 14.5×, whilst its computational costs are reduced by about 1.6× for a CPU-based application and 4.8× for an FPGA one. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/CRV.2019.00011 | 2019 16th Conference on Computer and Robot Vision (CRV) |
Keywords | Field | DocType |
Pruning,Quantization,Weight-Sharing,Coding,Embedded HW,CNN | Computer vision,Inference,Computer science,Field-programmable gate array,Embedded applications,Artificial intelligence,Execution time,Artificial neural network,Quantization (signal processing),Computer engineering,Lossless compression,Power consumption | Conference |
ISBN | Citations | PageRank |
978-1-7281-1839-0 | 0 | 0.34 |
References | Authors | |
5 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Alexander Frickenstein | 1 | 4 | 3.53 |
Christian Unger | 2 | 7 | 2.59 |
Walter Stechele | 3 | 365 | 52.77 |