Title
O⁴-DNN: A Hybrid DSP-LUT-Based Processing Unit With Operation Packing and Out-of-Order Execution for Efficient Realization of Convolutional Neural Networks on FPGA Devices
Abstract
In this paper, we propose O <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">4</sup> -DNN, a high-performance FPGA-based architecture for convolutional neural network (CNN) accelerators relying on operation packing and out-of-order (OoO) execution for DSP blocks augmented with LUT-based glue logic. The high-level architecture is comprised of a systolic array of processing elements (PEs), supporting output stationary dataflow. In this architecture, the computational unit of each PE is realized by using a DSP block as well as a small number of LUTs. Given the limited number of DSP blocks in FPGAs, the combination (DSP block and some LUTs) provides more computational power obtainable through each DSP block. The proposed computational unit performs eight convolutional operations on five input operands where one of them is an 8-bit weight and the others are four 8-bit input feature (IF) maps. In addition, to improve the energy efficiency of the proposed computational unit, we present an approximate form of the unit suitable for neural network applications. To reduce the memory bandwidth as well as increase the utilization of the computational units, a data reusing technique based on the weight sharing is also presented. To improve the performance of the proposed computational unit further, an addressing approach for computing the partial sums out-of-order is proposed. The efficacy of the architecture is assessed using two FPGA devices executing four state-of-the-art neural networks. Experimental results show that this architecture leads to, on average (up to), 2.5× (3.44×) higher throughput compared to a baseline structure. In addition, on average (maximum of), 12% (40%) energy efficiency improvement is achievable by employing the O <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">4</sup> -DNN compared to the baseline structure.
Year
DOI
Venue
2020
10.1109/TCSI.2020.2986350
IEEE Transactions on Circuits and Systems I: Regular Papers
Keywords
DocType
Volume
Convolutional neural network,DSP block,FPGA,out-of-order computations,systolic array
Journal
67
Issue
ISSN
Citations 
9
1549-8328
1
PageRank 
References 
Authors
0.37
0
4
Name
Order
Citations
PageRank
Pouya Haghi151.80
Mehdi Kamal218930.41
Ali Afzali-Kusha38111.95
Massoud Pedram478011211.32