On-Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA - Citegraph

Paper Info

Title
On-Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA

Abstract
A pre-trained convolutional deep neural network (CNN) is a feed-forward computation perspective, which is widely used for the embedded systems, requires highly power-and-area efficiency. This paper proposes a binarized CNN on an FPGA which treats only binary 2-values~(+1/-1) for the inputs and the weights. In this case, the multiplier is replaced into an XNOR circuit instead of a dedicated DSP block. For hardware implementation, using binarized inputs and weights is more suitable. However, the binarized CNN requires the batch normalization techniques to retain the classification accuracy. In that case, the additional multiplication and addition require extra hardware, also, the memory access for its parameters reduces system performance. In this paper, we propose a batch normalization free binarized CNN which is mathematically equivalent to one using batch normalization. The proposed CNN treats the binarized inputs and weights with the integer bias. We implemented the VGG-16 benchmark CNN on the Xilinx Inc. Zynq UltraScale+ MPSoC zcu102 evaluation board. Our binarized CNN stores all the weights, inputs, and output to on-chip BRAMs those are faster and dissipate lower power than an off-chip memory, such as a DDR4SDRAM. Compared with the conventional FPGA realizations, although the classification accuracy is 6.5% decayed, the performance is 2.45 times faster, the power efficiency is slightly better, and the area efficiency is 2.68 times better. Compared with the ARM Cortex-A57, it is 136.8 times faster, it dissipates 3.1 times much power, and its performance per power efficiency is 44.7 times better. Also, compared with the Maxwell embedded GPU, it is 4.9 times faster, it dissipates 1.3 times much power, and its performance per power efficiency is 3.8 times better. Thus, our method is suitable for the embedded computer system.

Year	DOI	Venue
2017	10.1109/IPDPSW.2017.95	2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Keywords	Field	DocType
FPGA,Deep Neural Network,CNN,Binarized CNN	Electrical efficiency,XNOR gate,Normalization (statistics),Computer science,Parallel computing,Algorithm,Field-programmable gate array,Multiplier (economics),Multiplication,Artificial neural network,MPSoC,Distributed computing	Conference
ISSN	ISBN	Citations
2164-7062	978-1-5386-3409-7	7
PageRank	References	Authors
0.74	20	2

Authors (2 rows)

Cited by (7 rows)

References (20 rows)

Name	Order	Citations	PageRank
Haruyoshi Yonekawa	1	34	4.37
Hiroki Nakahara	2	155	37.34

1