Title
Toward Multi-FPGA Acceleration of the Neural Networks
Abstract
AbstractHigh-throughput and low-latency Convolutional Neural Network (CNN) inference is increasingly important for many cloud- and edge-computing applications. FPGA-based acceleration of CNN inference has demonstrated various benefits compared to other high-performance devices such as GPGPUs. Current FPGA CNN-acceleration solutions are based on a single FPGA design, which are limited by the available resources on an FPGA. In addition, they can only accelerate conventional 2D neural networks. To address these limitations, we present a generic multi-FPGA solution, written in OpenCL, which can accelerate more complex CNNs (e.g., C3D CNN) and achieve a near linear speedup with respect to the available single-FPGA solutions. The design is built upon the Intel Deep Learning Accelerator architecture, with three extensions. First, it includes updates for better area efficiency (up to 25%) and higher performance (up to 24%). Second, it supports 3D convolutions for more challenging applications such as video learning. Third, it supports multi-FPGA communication for higher inference throughput. The results show that utilizing multiple FPGAs can linearly increase the overall bandwidth while maintaining the same end-to-end latency. In addition, the design can outperform other FPGA 2D accelerators by up to 8.4 times and 3D accelerators by up to 1.7 times.
Year
DOI
Venue
2021
10.1145/3432816
ACM Journal on Emerging Technologies in Computing Systems
Keywords
DocType
Volume
FPGA, neural networks, distributed systems
Journal
17
Issue
ISSN
Citations 
2
1550-4832
1
PageRank 
References 
Authors
0.36
0
3
Name
Order
Citations
PageRank
Saman Biookaghazadeh1132.54
Pravin Kumar Ravi210.36
Ziming Zhao332230.52