Title
FPGA-Based AI Smart NICs for Scalable Distributed AI Training Systems
Abstract
Training state-of-the-art artificial intelligence (AI) models requires scaling to many compute nodes and relies heavily on collective communication operations, such as all-reduce, to exchange the weight gradients between nodes. The overhead of these operations can bottleneck training performance as the number of nodes increases. In this paper, we first characterize the all-reduce operation overhead. Then, we propose a new smart network interface card (NIC) for distributed AI training using field-programmable gate arrays (FPGAs) to accelerate all-reduce operations and optimize bandwidth utilization via data compression. The AI smart NIC frees up the system's compute resources to perform the more compute-intensive tensor operations and increases the overall node-to-node communication efficiency. We build a prototype 6-node AI training system and show that our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6×, with an estimated 2.5× performance improvement at 32 nodes.
Year
DOI
Venue
2022
10.1109/LCA.2022.3189207
IEEE Computer Architecture Letters
Keywords
DocType
Volume
AI training,all-reduce,smart NIC,FPGA
Journal
21
Issue
ISSN
Citations 
2
1556-6056
0
PageRank 
References 
Authors
0.34
4
6
Name
Order
Citations
PageRank
Rui Ma110020.94
Evangelos Georganas221.04
Alexander Heinecke300.34
Sergey Gribok400.34
Andrew Boutros583.02
Eriko Nurvitadhi639933.08