Mandheling: mixed-precision on-device DNN training with DSP offloading - Citegraph

Paper Info

Title
Mandheling: mixed-precision on-device DNN training with DSP offloading

Abstract
ABSTRACTThis paper proposes Mandheling, the first system that enables highly resource-efficient on-device training by orchestrating mixed-precision training with on-chip Digital Signal Processor (DSP) offloading. Mandheling fully explores the advantages of DSP in integer-based numerical calculations using four novel techniques: (1) a CPU-DSP co-scheduling scheme to situationally mitigate the overhead from DSP-unfriendly operators; (2) a self-adaptive rescaling algorithm to reduce the overhead of dynamic rescaling in backward propagation; (3) a batch-splitting algorithm to improve DSP cache efficiency; (4) a DSP compute subgraph-reusing mechanism to eliminate the preparation overhead on DSP. We have fully implemented Mandheling and demonstrated its effectiveness through extensive experiments. The results show that, compared to the state-of-the-art DNN engines from TFLite and MNN, Mandheling reduces per-batch training time by 5.5X and energy consumption by 8.9X on average. In end-to-end training tasks, Mandheling reduces convergence time by up to 10.7X and energy consumption by 13.1X, with only 1.9%--2.7% accuracy loss compared to the FP32 precision setting.

Year	DOI	Venue
2022	10.1145/3495243.3560545	Mobile Computing and Networking
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	9

Authors (9 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Daliang Xu	1	0	0.34
Mengwei Xu	2	66	8.32
Qipeng Wang	3	0	0.34
Shangguang Wang	4	816	88.84
Yun Ma	5	216	20.25
Kang Huang	6	0	0.34
Gang Huang	7	1223	110.80
Xin Jin	8	0	0.34
Xuanzhe Liu	9	689	57.53

1