FPGA-Based AI Smart NICs for Scalable Distributed AI Training Systems - Citegraph

Paper Info

Title
FPGA-Based AI Smart NICs for Scalable Distributed AI Training Systems

Abstract
Training state-of-the-art artificial intelligence (AI) models requires scaling to many compute nodes and relies heavily on collective communication operations, such as all-reduce, to exchange the weight gradients between nodes. The overhead of these operations can bottleneck training performance as the number of nodes increases. In this paper, we first characterize the all-reduce operation overhead. Then, we propose a new smart network interface card (NIC) for distributed AI training using field-programmable gate arrays (FPGAs) to accelerate all-reduce operations and optimize bandwidth utilization via data compression. The AI smart NIC frees up the system's compute resources to perform the more compute-intensive tensor operations and increases the overall node-to-node communication efficiency. We build a prototype 6-node AI training system and show that our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6×, with an estimated 2.5× performance improvement at 32 nodes.

Year	DOI	Venue
2022	10.1109/LCA.2022.3189207	IEEE Computer Architecture Letters
Keywords	DocType	Volume
AI training,all-reduce,smart NIC,FPGA	Journal	21
Issue	ISSN	Citations
2	1556-6056	0
PageRank	References	Authors
0.34	4	6

Authors (6 rows)

Cited by (0 rows)

References (4 rows)

Name	Order	Citations	PageRank
Rui Ma	1	100	20.94
Evangelos Georganas	2	2	1.04
Alexander Heinecke	3	0	0.34
Sergey Gribok	4	0	0.34
Andrew Boutros	5	8	3.02
Eriko Nurvitadhi	6	399	33.08

1