Title
Weight Compression Mac Accelerator For Effective Inference Of Deep Learning
Abstract
Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.
Year
DOI
Venue
2020
10.1587/transele.2019CTP0007
IEICE TRANSACTIONS ON ELECTRONICS
Keywords
DocType
Volume
deep learning, convolutional neural network, quantization, variable bit width, post-training, inference, accelerator, processor, FPGA
Journal
E103C
Issue
ISSN
Citations 
10
1745-1353
0
PageRank 
References 
Authors
0.34
0
8
Name
Order
Citations
PageRank
Asuka Maki121.86
Daisuke Miyashita2729.99
Shin-ichi Sasaki316044.66
Kengo Nakata400.34
Fumihiko Tachibana5375.98
Tomoya Suzuki6243.37
Jun Deguchi712.38
Ryuichi Fujimoto82414.44