FPGA-based low-batch training accelerator for modern CNNs featuring high bandwidth memory - Citegraph

Paper Info

Title
FPGA-based low-batch training accelerator for modern CNNs featuring high bandwidth memory

Abstract
ABSTRACTTraining convolutional neural networks (CNNs) requires intensive computations as well as a large amount of storage and memory access. While low bandwidth off-chip memories in prior FPGA works have hindered the system-level performance, modern FPGAs offer high bandwidth memory (HBM2) that unlocks opportunities to improve the throughput/energy of FPGA-based CNN training. This paper presents a FPGA accelerator for CNN training which (1) uses HBM2 for efficient off-chip communication, and (2) supports various training operations (e.g. residual connections, stride-2 convolutions) for modern CNNs. We analyze the impact of HBM2 on CNN training workloads, provide a comprehensive comparison with DDR3, and present the strategies to efficiently use HBM2 features for enhanced CNN training performance. For training ResNet-20/VGG-like CNNs for CIFAR-10 dataset with low batch size of 2, the proposed CNN training accelerator on Intel Stratix-10 MX FPGA demonstrates 1.4/1.7X energy-efficiency improvement compared to Stratix-10 GX FPGA with DDR3 memory, and 4.5/9.7 X energy-efficiency improvement compared to Tesla V100 GPU.

Year	DOI	Venue
2020	10.1145/3400302.3415643	International Conference on Computer-Aided Design
Keywords	DocType	ISSN
Convolutional neural networks,neural network training,backpropagation,hardware accelerator,FPGA	Conference	1933-7760
Citations	PageRank	References
0	0.34	13
Authors
7

Authors (7 rows)

Cited by (0 rows)

References (13 rows)

Name	Order	Citations	PageRank
Shreyas K. Venkataramanaiah	1	3	1.39
Han-Sok Suh	2	0	0.34
Shihui Yin	3	71	10.03
Eriko Nurvitadhi	4	399	33.08
Aravind Dasu	5	10	4.47
Yu Cao	6	2765	245.91
Jae-sun Seo	7	536	56.32

1