Title
On-chip Memory Optimized CNN Accelerator with Efficient Partial-sum Accumulation
Abstract
In convolutional neural networks (CNNs), data movement inside convolution layers between memory and PEs is most energy dominant. This paper proposes a convolution processing dataflow that reduces both the number of memory accesses and the on-chip buffer capacity for convolution operations. Based on the dataflow, we design an on-chip buffer-minimized CNN accelerator. Compared with the state-of-the-art CNN accelerator, the proposed CNN accelerator utilizes 2.30 times less on-chip buffer and 2.18 times energy efficiency to achieve the same data throughput under Alexnet. The proposed architecture is able to achieve higher data throughput with the almost constant on-chip buffer capacity.
Year
DOI
Venue
2020
10.1145/3386263.3406925
GLSVLSI '20: Great Lakes Symposium on VLSI 2020 Virtual Event China September, 2020
DocType
ISBN
Citations 
Conference
978-1-4503-7944-1
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Hongjie Xu101.69
Shiomi, J.295.37
Hidetoshi Onodera3455105.29