Title
DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing
Abstract
Recently, deep neural networks (DNNs) have been applied to most intelligent applications and deployed on different kinds of devices. However, DNN inference is resource-intensive. Especially, in edge computing, DNN inference demands to face the constrained computing resource of end devices and excessive data transmission costs when offloading raw data to the edge server. A better solution is DNN partitioning, which splits the DNN into two parts, one running on end devices and the other on the edge server. However, one edge server often needs to provide services for multiple end devices simultaneously, which may cause excessive queueing delay. To meet the latency requirements of real-time DNN tasks, we combine the early-exit mechanism and DNN partitioning. We formally define the DNN inference with partitioning and early-exit as an optimization problem. To solve the problem, we propose two efficient algorithms to determine the partition points of DNN partitioning and the thresholds of the early-exit mechanism. We conduct extensive simulations on our proposed algorithms, and the results show that they can dramatically accelerate DNN inference while achieving high accuracy.
Year
DOI
Venue
2021
10.1007/978-3-030-85928-2_37
WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2021, PT I
Keywords
DocType
Volume
Edge computing, DNN inference, DNN partitioning, Early-exit
Conference
12937
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
10
5
Name
Order
Citations
PageRank
Chao Li100.34
Hongli Xu250285.92
Yang Xu3476.27
Zhiyuan Wang431.41
Liusheng Huang547364.55