Title | ||
---|---|---|
An Energy-Efficient Accelerator with Relative- Indexing Memory for Sparse Compressed Convolutional Neural Network |
Abstract | ||
---|---|---|
Deep convolutional neural networks (CNNs) are widely used in image recognition and feature classification. However, deep CNNs are hard to be fully deployed for edge devices due to both computation-intensive and memory-intensive workloads. The energy efficiency of CNNs is dominated by off-chip memory accesses and convolution computation. In this paper, an energy-efficient accelerator is proposed for sparse compressed CNNs by reducing DRAM accesses and eliminating zero-operand computation. Weight compression is utilized for sparse compressed CNNs to reduce the required memory capacity/bandwidth and a large portion of connections. Thus, ReLU function produces zero-valued activations. Additionally, the workloads are distributed based on channels to increase the degree of task parallelism, and all-row- to-all-row non-zero element multiplication is adopted for skipping redundant computation. The simulation results over the dense accelerator show that the proposed accelerator achieves 1.79x speedup and reduces 23.51%, 69.53%, 88.67% on-chip memory size, energy, and DRAM accesses of VGG-16. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/AICAS.2019.8771600 | 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) |
Keywords | Field | DocType |
energy-efficient accelerator,sparse compressed CNNs,DRAM accesses,weight compression,on-chip memory size,indexing memory,sparse compressed convolutional neural network,image recognition,feature classification,deep CNNs,memory-intensive workloads,energy efficiency,off-chip memory accesses,convolution computation,dense accelerator,all-row-to-all-row nonzero element multiplication,zero-operand computation,relative-indexing memory,deep convolutional neural network,computation-intensive workloads,ReLU function,zero-valued activations,task parallelism | Dram,Convolutional neural network,Convolution,Task parallelism,Computer science,Parallel computing,Search engine indexing,Bandwidth (signal processing),Multiplication,Speedup | Conference |
ISBN | Citations | PageRank |
978-1-5386-7885-5 | 0 | 0.34 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
I-Chen Wu | 1 | 208 | 55.03 |
Po-tsang Huang | 2 | 130 | 21.23 |
Chin-Yang Lo | 3 | 0 | 0.68 |
Wei Hwang | 4 | 254 | 44.40 |