DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing - Citegraph

Paper Info

Title
DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing

Abstract
Recently, deep neural networks (DNNs) have been applied to most intelligent applications and deployed on different kinds of devices. However, DNN inference is resource-intensive. Especially, in edge computing, DNN inference demands to face the constrained computing resource of end devices and excessive data transmission costs when offloading raw data to the edge server. A better solution is DNN partitioning, which splits the DNN into two parts, one running on end devices and the other on the edge server. However, one edge server often needs to provide services for multiple end devices simultaneously, which may cause excessive queueing delay. To meet the latency requirements of real-time DNN tasks, we combine the early-exit mechanism and DNN partitioning. We formally define the DNN inference with partitioning and early-exit as an optimization problem. To solve the problem, we propose two efficient algorithms to determine the partition points of DNN partitioning and the thresholds of the early-exit mechanism. We conduct extensive simulations on our proposed algorithms, and the results show that they can dramatically accelerate DNN inference while achieving high accuracy.

Year	DOI	Venue
2021	10.1007/978-3-030-85928-2_37	WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2021, PT I
Keywords	DocType	Volume
Edge computing, DNN inference, DNN partitioning, Early-exit	Conference	12937
ISSN	Citations	PageRank
0302-9743	0	0.34
References	Authors
10	5

Authors (5 rows)

Cited by (0 rows)

References (10 rows)

Name	Order	Citations	PageRank
Chao Li	1	0	0.34
Hongli Xu	2	502	85.92
Yang Xu	3	47	6.27
Zhiyuan Wang	4	3	1.41
Liusheng Huang	5	473	64.55

1