Abstract | ||
---|---|---|
Interactive intelligent services (e.g., smart web search) are becoming essential datacenter workloads. They rely on data-intensive artificial intelligence (AI) algorithms that do not use batch computation due to their tight latency constraints. Since off-chip data accesses have higher latency and energy consumption than on-chip accesses, a persistent AI approach with the entire model stored in on-chip memory is becoming the new norm for real-time AI. This approach is the cornerstone of Microsoft's Brainwave FPGA-based AI cloud and was recently added to Nvidia's cuDNN library. In this work, we implement, optimize and evaluate a Brainwave-like neural processing unit (NPU) on a large Stratix-10 FPGA. We benchmark it against a large Nvidia Volta GPU running cuDNN persistent AI kernels. Across real-time persistent RNN, GRU, and LSTM workloads, we show that Stratix-10 offers ~3× (FP32) and ~10× (INT8) better latency than GPU (FP32), which uses only ~6% of its peak throughput. Then, we propose TensorRAM, an ASIC chiplet for persistent AI that is 2.5D integrated with an FPGA in the same package. TensorRAM enhances the on-chip memory capacity and bandwidth, with enough multi-precision INT8/4/2/1 throughput to match that bandwidth. Multiple TensorRAMs can be integrated with Stratix-10. Our evaluation shows that a small 32-mm2 TensorRAM on 10nm offers 64MB of SRAMs with 32TB/s on-chiplet bandwidth and 64 TOP/s (INT8). A small Stratix-10 with a TensorRAM (INT8) offers 16× better latency and 34× energy efficiency compared to GPU (FP32). Overall, Stratix-10 with TensorRAM offers compelling and scalable persistent AI solutions.
|
Year | DOI | Venue |
---|---|---|
2019 | 10.1145/3289602.3293943 | FPGA |
DocType | ISBN | Citations |
Conference | 978-1-4503-6137-8 | 0 |
PageRank | References | Authors |
0.34 | 0 | 16 |
Name | Order | Citations | PageRank |
---|---|---|---|
Eriko Nurvitadhi | 1 | 399 | 33.08 |
Dongup Kwon | 2 | 25 | 4.92 |
Ali Jafari | 3 | 43 | 7.04 |
Andrew Boutros | 4 | 8 | 3.02 |
Jaewoong Sim | 5 | 384 | 17.25 |
Phillip Tomson | 6 | 6 | 0.94 |
Huseyin Sumbul | 7 | 6 | 2.29 |
Gregory K. Chen | 8 | 298 | 32.96 |
Phil V. Knag | 9 | 0 | 0.34 |
Raghavan Kumar | 10 | 73 | 12.56 |
Ram Krishnamurthy | 11 | 650 | 74.63 |
Debbie Marr | 12 | 175 | 12.39 |
Sergey Gribok | 13 | 9 | 3.78 |
Bogdan Pasca | 14 | 325 | 28.69 |
Martin Langhammer | 15 | 104 | 20.22 |
Aravind Dasu | 16 | 10 | 4.47 |