Abstract | ||
---|---|---|
AI inference accelerators have drawn extensive attention. But none of the previous work performs a holistic and systematic benchmarking on AI inference accelerators. First, an end-to-end AI inference pipeline consists of six stages on both host and accelerators. However, previous work mainly evaluates hardware execution performance, which is only one stage on accelerators. Second, there is a lack of a systematic evaluation of different optimizations on AI inference accelerators. Along with six representative AI workloads and a typical AI inference accelerator–Diannao based on Cambricon ISA, we implement five frequently-used AI inference optimizations as user-configurable hyper-parameters. We explore the optimization space by sweeping the hyper-parameters and quantifying each optimization’s effect on the chosen metrics. We also provide cross-platform comparisons between Diannao and traditional platforms (Intel CPUs and Nvidia GPUs). Our evaluation provides several new observations and insights, which sheds light on the comprehensive understanding of AI inference accelerators’ performance and instructs the co-design of the upper-level optimizations and underlying hardware architecture. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1007/s42514-022-00105-z | CCF Transactions on High Performance Computing |
Keywords | DocType | Volume |
AI accelerators, Inference, Performance evaluation, Optimization | Journal | 4 |
Issue | ISSN | Citations |
2 | 2524-4922 | 0 |
PageRank | References | Authors |
0.34 | 2 | 9 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jiang Zihan | 1 | 0 | 0.34 |
Li Jiansong | 2 | 0 | 0.34 |
Liu Fangxin | 3 | 0 | 0.34 |
Wanling Gao | 4 | 299 | 19.12 |
Lei Wang | 5 | 577 | 46.85 |
Lan Chuanxin | 6 | 0 | 0.34 |
Tang Fei | 7 | 0 | 0.34 |
Liu Lei | 8 | 0 | 0.34 |
Li Tao | 9 | 0 | 0.34 |