Title
A Coarse-Grained Dual-Convolver Based CNN Accelerator with High Computing Resource Utilization
Abstract
Deep learning technologies have been developed rapidly in recent years and have played an important role in our lives. Among them, convolutional neural network (CNN) performs well in many applications. The quality of result is generally getting better as the number of convolutional layers increases, which also increases the computational complexity. Hence, a highly resource-efficient accelerator is demanded. In this paper, we propose a new CNN accelerator that features a delay-chain-free input data aligner as well as a dual-convolver processing element (DCPE). Our architecture does not require delay chains with a large number of registers for input data alignment, which not only reduces the area and power but improves the overall resource utilization. In addition, a set of DCPEs shares the same input aligner to produce multiple output feature maps concurrently, which offers the desirable computing power and reduces the external memory traffic. An accelerator instance with 8 DCPEs (144 MACs) has been implemented using TSMC 40nm process. The internal logic only consumes 285K gates and the total internal memory size is merely 44KB. As running VGG-16, the average performance is 190GOPS (@750MHz), the resource (MAC) utilization reaches 8S.3%, and the energy efficiency is 481GOPS/W.
Year
DOI
Venue
2020
10.1109/AICAS48895.2020.9073835
2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)
Keywords
DocType
ISBN
convolutional neural network CNN,hardware accelerator,high resource utilization,low data bandwidth
Conference
978-1-7281-4923-3
Citations 
PageRank 
References 
0
0.34
0
Authors
3
Name
Order
Citations
PageRank
Yi Lu100.68
Yi-Lin Wu200.68
Juinn-Dar Huang327027.42