Abstract | ||
---|---|---|
Many efforts have been made to improve the efficiency for inference of deep convolutional neural network. To achieve further improvement of the efficiency without penalty of accuracy, we propose filter-wise optimized quantization with variable precision and the hardware architecture that fully supports it; as the bit precision for operations is reduced by granularity optimizing weight bit precision filter-by-filter, the execution time is reduced proportionally to the total number of computations multiplied with the number of weight bit. We implement the proposed architecture on FPGA and demonstrate that ResNet-50 run with 5.3× less execution cycles without penalty of accuracy. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/ASSCC.2018.8579342 | 2018 IEEE Asian Solid-State Circuits Conference (A-SSCC) |
Keywords | Field | DocType |
deep learning,convolutional neural network,quantization,variable bit width,FPGA | Convolutional neural network,Computer science,Convolution,Inference,Field-programmable gate array,Real-time computing,Execution time,Quantization (signal processing),Computer hardware,Hardware architecture,Computation | Conference |
ISBN | Citations | PageRank |
978-1-5386-6414-8 | 0 | 0.34 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Asuka Maki | 1 | 2 | 1.86 |
Daisuke Miyashita | 2 | 72 | 9.99 |
Kengo Nakata | 3 | 21 | 6.32 |
Fumihiko Tachibana | 4 | 37 | 5.98 |
Tomoya Suzuki | 5 | 24 | 3.37 |
Jun Deguchi | 6 | 1 | 2.38 |