Title | ||
---|---|---|
FPGA-based low-batch training accelerator for modern CNNs featuring high bandwidth memory |
Abstract | ||
---|---|---|
ABSTRACTTraining convolutional neural networks (CNNs) requires intensive computations as well as a large amount of storage and memory access. While low bandwidth off-chip memories in prior FPGA works have hindered the system-level performance, modern FPGAs offer high bandwidth memory (HBM2) that unlocks opportunities to improve the throughput/energy of FPGA-based CNN training. This paper presents a FPGA accelerator for CNN training which (1) uses HBM2 for efficient off-chip communication, and (2) supports various training operations (e.g. residual connections, stride-2 convolutions) for modern CNNs. We analyze the impact of HBM2 on CNN training workloads, provide a comprehensive comparison with DDR3, and present the strategies to efficiently use HBM2 features for enhanced CNN training performance. For training ResNet-20/VGG-like CNNs for CIFAR-10 dataset with low batch size of 2, the proposed CNN training accelerator on Intel Stratix-10 MX FPGA demonstrates 1.4/1.7X energy-efficiency improvement compared to Stratix-10 GX FPGA with DDR3 memory, and 4.5/9.7 X energy-efficiency improvement compared to Tesla V100 GPU. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1145/3400302.3415643 | International Conference on Computer-Aided Design |
Keywords | DocType | ISSN |
Convolutional neural networks,neural network training,backpropagation,hardware accelerator,FPGA | Conference | 1933-7760 |
Citations | PageRank | References |
0 | 0.34 | 13 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shreyas K. Venkataramanaiah | 1 | 3 | 1.39 |
Han-Sok Suh | 2 | 0 | 0.34 |
Shihui Yin | 3 | 71 | 10.03 |
Eriko Nurvitadhi | 4 | 399 | 33.08 |
Aravind Dasu | 5 | 10 | 4.47 |
Yu Cao | 6 | 2765 | 245.91 |
Jae-sun Seo | 7 | 536 | 56.32 |