Title
Band: coordinated multi-DNN inference on heterogeneous mobile processors
Abstract
BSTRACTThe rapid development of deep learning algorithms, as well as innovative hardware advancements, encourages multi-DNN workloads such as augmented reality applications. However, existing mobile inference frameworks like TensorFlow Lite and MNN fail to efficiently utilize heterogeneous processors available on mobile platforms, because they focus on running a single DNN on a specific processor. As mobile processors are too resource-limited to deliver reasonable performance for such workloads by their own, it is challenging to serve multi-DNN workloads with existing frameworks. This paper introduces Band, a new mobile inference system that coordinates multi-DNN workloads on heterogeneous processors. Band examines a DNN beforehand and partitions it into a set of subgraphs, while taking operator dependency into account. At runtime, Band dynamically selects a schedule of subgraphs from multiple possible schedules, following the scheduling goal of a pluggable scheduling policy. Fallback operators, which are not supported by certain mobile processors, are also considered when generating subgraphs. Evaluation results on mobile platforms show that our system outperforms TensorFlow Lite, a state-of-the-art mobile inference framework, by up to 5.04× for single-app workloads involving multiple DNNs. For a multi-app scenario consisting of latency-critical DNN requests, Band reaches up to 3.76× higher SLO satisfaction rate.
Year
DOI
Venue
2022
10.1145/3498361.3538948
Mobile Systems, Applications, and Services
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
7
Name
Order
Citations
PageRank
Joo Seong Jeong1133.23
Jingyu Lee200.34
Donghyun Kim300.34
Changmin Jeon400.34
Changjin Jeong500.34
Youngki Lee683270.33
Byung-Gon Chun73832234.37