Toward Multi-FPGA Acceleration of the Neural Networks - Citegraph

Paper Info

Title
Toward Multi-FPGA Acceleration of the Neural Networks

Abstract
AbstractHigh-throughput and low-latency Convolutional Neural Network (CNN) inference is increasingly important for many cloud- and edge-computing applications. FPGA-based acceleration of CNN inference has demonstrated various benefits compared to other high-performance devices such as GPGPUs. Current FPGA CNN-acceleration solutions are based on a single FPGA design, which are limited by the available resources on an FPGA. In addition, they can only accelerate conventional 2D neural networks. To address these limitations, we present a generic multi-FPGA solution, written in OpenCL, which can accelerate more complex CNNs (e.g., C3D CNN) and achieve a near linear speedup with respect to the available single-FPGA solutions. The design is built upon the Intel Deep Learning Accelerator architecture, with three extensions. First, it includes updates for better area efficiency (up to 25%) and higher performance (up to 24%). Second, it supports 3D convolutions for more challenging applications such as video learning. Third, it supports multi-FPGA communication for higher inference throughput. The results show that utilizing multiple FPGAs can linearly increase the overall bandwidth while maintaining the same end-to-end latency. In addition, the design can outperform other FPGA 2D accelerators by up to 8.4 times and 3D accelerators by up to 1.7 times.

Year	DOI	Venue
2021	10.1145/3432816	ACM Journal on Emerging Technologies in Computing Systems
Keywords	DocType	Volume
FPGA, neural networks, distributed systems	Journal	17
Issue	ISSN	Citations
2	1550-4832	1
PageRank	References	Authors
0.36	0	3

Authors (3 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Saman Biookaghazadeh	1	13	2.54
Pravin Kumar Ravi	2	1	0.36
Ziming Zhao	3	322	30.52

1