Title
An FPGA Realization of a Deep Convolutional Neural Network Using a Threshold Neuron Pruning.
Abstract
For a pre-trained deep convolutional neural network (CNN) for an embedded system, a high-speed and a low power consumption are required. In the former of the CNN, it consists of convolutional layers, while in the latter, it consists of fully connection layers. In the convolutional layer, the multiply accumulation operation is a bottleneck, while the fully connection layer, the memory access is a bottleneck. In this paper, we propose a neuron pruning technique which eliminates almost part of the weight memory. In that case, the weight memory is realized by an on-chip memory on the FPGA. Thus, it achieves a high speed memory access. In this paper, we propose a sequential-input parallel-output fully connection layer circuit. The experimental results showed that, by the neuron pruning, as for the fully connected layer on the VGG-11 CNN, the number of neurons was reduced by 89.3% with keeping the 99% accuracy. We implemented the fully connected layers on the Digilent Inc. NetFPGA-1G-CML board. Comparison with the CPU (ARM Cortex A15 processor) and the GPU (Jetson TK1 Kepler), as for a delay time, the FPGA was 219.0 times faster than the CPU and 12.5 times faster than the GPU. Also, a performance per power efficiency was 125.28 times better than CPU and 17.88 times better than GPU.
Year
DOI
Venue
2017
10.1007/978-3-319-56258-2_23
Lecture Notes in Computer Science
Field
DocType
Volume
Electrical efficiency,ARM architecture,Bottleneck,Central processing unit,Threshold potential,Computer science,Convolutional neural network,Parallel computing,Field-programmable gate array,Pruning
Conference
10216
ISSN
Citations 
PageRank 
0302-9743
1
0.35
References 
Authors
17
4
Name
Order
Citations
PageRank
Tomoya Fujii151.90
Simpei Sato210.35
Hiroki Nakahara315537.34
Masato Motomura49127.81