GCD<sup>2</sup>: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs - Citegraph

Paper Info

Title
GCD<sup>2</sup>: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs

Abstract
More specialized chips are exploiting available high transistor density to expose parallelism at a large scale with more intricate instruction sets. This paper reports on a compilation system GCD <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> , developed to support complex Deep Neural Network (DNN) workloads on mobile DSP chips. We observe several challenges in fully exploiting this architecture, related to SIMD width, more complex SIMD/vector instructions, and VLIW pipeline with the notion of soft dependencies. GCD <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> comprises the following contributions: 1) development of matrix layout formats that support the use of different novel SIMD instructions, 2) formulation and solution of a global optimization problem related to choosing the best instruction (and associated layout) for implementation of each operator in a complete DNN, and 3) SDA, an algorithm for packing instructions with consideration for soft dependencies. These solutions are incorporated in a complete compilation system that is extensively evaluated against other systems using 10 large DNN models. Evaluation results show that GCD <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> outperforms two product-level state-of-the-art end-to-end DNN execution frameworks (TFLite and Qualcomm SNPE) that support mobile DSPs by up to $ 6.0 \times$ speedup, and outperforms three established compilers (Halide, TVM, and RAKE) by up to $4.5 \times, 3.4 \times$ and $4.0 \times$ speedup, respectively. GCD <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> is also unique in supporting, real-time execution of certain DNNs, while its implementation enables two major DNNs to execute on a mobile DSP for the first time.

Year	DOI	Venue
2022	10.1109/MICRO56248.2022.00044	2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)
Keywords	DocType	ISBN
VLIW instruction packing,compiler optimization,deep neural network,mobile devices	Conference	978-1-6654-7428-3
Citations	PageRank	References
0	0.34	47
Authors
6

Authors (6 rows)

Cited by (0 rows)

References (47 rows)

Name	Order	Citations	PageRank
Wei Niu	1	24	11.21
Jiexiong Guan	2	0	0.34
Xipeng Shen	3	2025	118.55
Yanzhi Wang	4	1082	136.11
Gagan Agrawal	5	2058	209.59
Bin Ren	6	82	18.03

1