Title | ||
---|---|---|
High-Speed Power-Efficient Coarse-Grained Convolver Architecture using Depth-First Compression Scheme |
Abstract | ||
---|---|---|
Convolutional neural networks (CNNs) have been playing an important role in various applications, e.g., computer vision. Since CNN computations require numerous multiply-accumulate (MAC) operations, how to get them done efficiently is a crucial issue for CNN hardware accelerators. In this paper, we propose a high-speed power-efficient convolver architecture for CNN acceleration. A 3×3 convolver is asked to produce an output every cycle and is commonly accomplished by summing up the results of nine parallel multiplications, which requires ten carry-propagation adders (CPAs) in total. However, the proposed coarse-grained convolver can break the boundary between multipliers and reduce all partial products in a more global way. Consequently, it requires only one CPA to generate the final outcome. It also features a globally delay-optimized partial product reduction tree and a depth-first compression scheme for both area and power minimization. The proposed convolver has been implemented using TSMC 40nm technology. Compared to a conventional 3×3 convolver baseline design, our design can reduce area and power by 15.8% and 26.5% respectively at the clock rate of 1GHz. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/ISCAS45731.2020.9180406 | 2020 IEEE International Symposium on Circuits and Systems (ISCAS) |
Keywords | DocType | ISBN |
Convolvers,Delays,Computer architecture,Pipelines,Minimization,Hardware | Conference | 978-1-7281-3320-1 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yi-Lin Wu | 1 | 0 | 0.68 |
Yi Lu | 2 | 0 | 0.68 |
Juinn-Dar Huang | 3 | 270 | 27.42 |