Title
Automatic Compiler Based FPGA Accelerator for CNN Training
Abstract
Training of convolutional neural networks (CNNs) on embedded platforms to support on-device learning is earning vital importance in recent days. Designing flexible training hardware is much more challenging than inference hardware, due to design complexity and large computation/memory requirement. In this work, we present an automatic compiler based FPGA accelerator with 16-bit fixed-point precision for complete CNN training, including Forward Pass (FP), Backward Pass (BP) and Weight Update (WU). We implemented an optimized RTL library to perform training-specific tasks and developed an RTL compiler to automatically generate FPGA-synthesizable RTL based on user-defined constraints. We present a new cyclic weight storage/access scheme for on-chip BRAM and off-chip DRAM to efficiently implement non-transpose and transpose operations during FP and BP phases, respectively. Representative CNNs for CIFAR-10 dataset are implemented and trained on Intel Stratix 10 GX FPGA using proposed hardware architecture, demonstrating up to 479 GOPS performance.
Year
DOI
Venue
2019
10.1109/FPL.2019.00034
2019 29th International Conference on Field Programmable Logic and Applications (FPL)
Keywords
Field
DocType
Convolution neural networks, neural network training, back-propagation, hardware accelerator, FPGA
Stratix,Computer architecture,Computer science,Convolutional neural network,Parallel computing,Field-programmable gate array,Code generation,Compiler,Artificial intelligence,Hardware acceleration,Deep learning,Hardware architecture
Conference
ISSN
ISBN
Citations 
1946-147X
978-1-7281-4885-4
2
PageRank 
References 
Authors
0.46
0
7
Name
Order
Citations
PageRank
Shreyas Kolala Venkataramanaiah121.13
Yu-Fei Ma2116663.05
Shihui Yin37110.03
Eriko Nurvitadhi439933.08
Aravind Dasu5104.47
Yu Cao62765245.91
Jae-sun Seo753656.32