Title | ||
---|---|---|
Evaluation of Optimized CNNs on FPGA and non-FPGA based Accelerators using a Novel Benchmarking Approach. |
Abstract | ||
---|---|---|
Numerous algorithmic optimization techniques have been proposed to alleviate the computational complexity of convolutional neural networks (CNNs). However, given the broad selection of inference accelerators, it is not obvious which approach benefits from which optimization and to what degree. In addition, the design space is further obscured by many deployment settings such as power and operating modes, batch sizes, as well as ill-defined measurement methodologies. In this paper, we systematically benchmark different types of CNNs leveraging both pruning and quantization as the most promising optimization techniques leveraging a novel benchmarking approach. We evaluate a spectrum of FPGA implementations, GPU, TPU and VLIW processor, for a selection of systematically pruned and quantized neural networks (including ResNet50, GoogleNetv1, MobileNetv1, a VGG derivative, and a multilayer perceptron) taking the full design space into account including batch sizes, thread counts, stream sizes and operating modes, and considering power, latency, and throughput at a specific accuracy as figure of merit. Our findings show that channel pruning is effective across most hardware platforms, with resulting speedups directly correlated to the reduction in compute load, while FPGAs benefit the most from quantization. FPGAs outperform regarding latency and latency variation for the majority of CNNs, in particular with feed-forward dataflow implementations. Finally, pruning and quantization are orthogonal techniques and yield the majority of all optimal design points when combined. With this benchmarking approach, both in terms of methodology and measured results, we aim to drive more clarity in the choice of CNN implementations and optimizations.
|
Year | DOI | Venue |
---|---|---|
2020 | 10.1145/3373087.3375348 | FPGA |
Field | DocType | ISBN |
Computer architecture,Computer science,Parallel computing,Field-programmable gate array,Benchmarking | Conference | 978-1-4503-7099-8 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Michaela Blott | 1 | 315 | 25.60 |
Johannes Kath | 2 | 0 | 0.34 |
Lisa Halder | 3 | 2 | 1.04 |
Yaman Umuroglu | 4 | 186 | 10.67 |
Nicholas J. Fraser | 5 | 177 | 12.85 |
Giulio Gambardella | 6 | 200 | 13.13 |
Miriam Leeser | 7 | 12 | 3.65 |
Linda Doyle | 8 | 20 | 2.01 |