Title
A Tri-State Weight Convolutional Neural Network for an FPGA: Applied to YOLOv2 Object Detector
Abstract
A frame object detection, such as the YOLO (You only look once), is used in embedded vision systems, such as a robot, an automobile, a security camera, and a drone. However, it requires highly performance-per-power detection by an inexpensive device. In the paper, we propose a tri-state weight CNN, which is a generalization of a low-precision and sparse (pruning) for CNN weight. In the former part, we set a weight {-1,0,+1} as a ternary CNN, while in the latter part, we set a {-w,0,+w} as a sparse weight CNN. The proposed tri-state CNN is a kind of a mixed-precision one, which is suitable for an object detector consisting of a bounding box prediction (regression) and a class estimation (classification). We apply an indirect memory access architecture to skip zero part and propose the weight parallel 2D convolutional circuit. It can efficiently be applied to the AlexNet based CNN, which has different size kernels. We design the AlexNet based YOLOv2 to reduce the number of layers toward low-latency computation. In the experiment, the proposed tri-state scheme CNN reduces the memory size for weight by 92%. We implement the proposed tri-state weight YOLOv2 on the AvNet Inc. UltraZed-EG starter kit, which has the Xilinx Inc. Zynq Ultrascale+ MPSoC ZU3EG. It archived 61.70 frames per second (FPS), which exceeds the standard video frame rate (29.97 FPS). Compared with the ARM Cortex-A57, it was 268.2 times faster, and its performance per power efficiency was 313.51 times better. Also, compared with the NVidia Pascal embedded GPU, it was 4.0 times faster, and its power performance efficiency was 11.35 times better.
Year
DOI
Venue
2018
10.1109/FPT.2018.00058
2018 International Conference on Field-Programmable Technology (FPT)
Keywords
Field
DocType
FPGA,Object Detection,Deep Learning,Embedded System
Object detection,Computer science,Convolutional neural network,Parallel computing,Field-programmable gate array,Computational science,Artificial intelligence,Frame rate,Deep learning,Detector,MPSoC,Minimum bounding box
Conference
ISBN
Citations 
PageRank 
978-1-7281-0215-3
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Hiroki Nakahara115537.34
Masayuki Shimoda286.45
Shimpei Sato34313.03