Title
Specializing FGPU for Persistent Deep Learning
Abstract
AbstractOverlay architectures are a good way to enable fast development and debug on FPGAs at the expense of potentially limited performance compared to fully customized FPGA designs. When used in concert with hand-tuned FPGA solutions, performant overlay architectures can improve time-to-solution and thus overall productivity of FPGA solutions. This work tunes and specializes FGPU, an open source OpenCL-programmable GPU overlay for FPGAs. We demonstrate that our persistent deep learning (PDL)-FGPU architecture maintains the ease-of-programming and generality of GPU programming while achieving high performance from specialization for the persistent deep learning domain. We also propose an easy method to specialize for other domains. PDL-FGPU includes new instructions, along with micro-architecture and compiler enhancements. We evaluate both the FGPU baseline and the proposed PDL-FGPU on a modern high-end Intel Stratix 10 2800 FPGA in simulation running persistent DL applications (RNN, GRU, LSTM), and non-DL applications to demonstrate generality. PDL-FGPU requires 1.4–3× more ALMs, 4.4–6.4× more M20ks, and 1–9.5× more DSPs than baseline, but improves performance by 56–693× for PDL applications with an average 23.1% degradation on non-PDL applications. We integrated the PDL-FGPU overlay into Intel OPAE to measure real-world performance/power and demonstrate that PDL-FGPU is only 4.0–10.4× slower than the Nvidia V100.
Year
DOI
Venue
2021
10.1145/3457886
ACM Transactions on Reconfigurable Technology and Systems
Keywords
DocType
Volume
Overlay, specialization, FPGA, GPU, soft GPU, persistent deep learning, RNN
Journal
14
Issue
ISSN
Citations 
2
1936-7406
0
PageRank 
References 
Authors
0.34
0
10
Name
Order
Citations
PageRank
Rui Ma111.02
Jia-Ching Hsu200.34
Tian Tan332.11
Eriko Nurvitadhi400.34
David Sheffield5333.54
Rob Pelt600.34
Martin Langhammer710420.22
Jaewoong Sim838417.25
Aravind Dasu9104.47
Derek Chiou1071848.97