Title
On-Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA
Abstract
A pre-trained convolutional deep neural network (CNN) is a feed-forward computation perspective, which is widely used for the embedded systems, requires highly power-and-area efficiency. This paper proposes a binarized CNN on an FPGA which treats only binary 2-values~(+1/-1) for the inputs and the weights. In this case, the multiplier is replaced into an XNOR circuit instead of a dedicated DSP block. For hardware implementation, using binarized inputs and weights is more suitable. However, the binarized CNN requires the batch normalization techniques to retain the classification accuracy. In that case, the additional multiplication and addition require extra hardware, also, the memory access for its parameters reduces system performance. In this paper, we propose a batch normalization free binarized CNN which is mathematically equivalent to one using batch normalization. The proposed CNN treats the binarized inputs and weights with the integer bias. We implemented the VGG-16 benchmark CNN on the Xilinx Inc. Zynq UltraScale+ MPSoC zcu102 evaluation board. Our binarized CNN stores all the weights, inputs, and output to on-chip BRAMs those are faster and dissipate lower power than an off-chip memory, such as a DDR4SDRAM. Compared with the conventional FPGA realizations, although the classification accuracy is 6.5% decayed, the performance is 2.45 times faster, the power efficiency is slightly better, and the area efficiency is 2.68 times better. Compared with the ARM Cortex-A57, it is 136.8 times faster, it dissipates 3.1 times much power, and its performance per power efficiency is 44.7 times better. Also, compared with the Maxwell embedded GPU, it is 4.9 times faster, it dissipates 1.3 times much power, and its performance per power efficiency is 3.8 times better. Thus, our method is suitable for the embedded computer system.
Year
DOI
Venue
2017
10.1109/IPDPSW.2017.95
2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Keywords
Field
DocType
FPGA,Deep Neural Network,CNN,Binarized CNN
Electrical efficiency,XNOR gate,Normalization (statistics),Computer science,Parallel computing,Algorithm,Field-programmable gate array,Multiplier (economics),Multiplication,Artificial neural network,MPSoC,Distributed computing
Conference
ISSN
ISBN
Citations 
2164-7062
978-1-5386-3409-7
7
PageRank 
References 
Authors
0.74
20
2
Name
Order
Citations
PageRank
Haruyoshi Yonekawa1344.37
Hiroki Nakahara215537.34