Layerweaver plus : A QoS-Aware Layer-Wise DNN Scheduler for Multi-Tenant Neural Processing Units - Citegraph

Paper Info

Title
Layerweaver plus : A QoS-Aware Layer-Wise DNN Scheduler for Multi-Tenant Neural Processing Units

Abstract
Many cloud service providers employ specialized hardware accelerators, called neural processing units (NPUs), to accelerate deep neural networks (DNNs). An NPU scheduler is responsible for scheduling incoming user requests and required to satisfy the two, often conflicting, optimization goals: maximizing system throughput and satisfying quality-of-service (QoS) constraints (e.g., deadlines) of individual requests. We propose Layerweaver+, a low-cost layer-wise DNN scheduler for NPUs, which provides both high system throughput and minimal QoS violations. For a serving scenario based on the industry-standard MLPerf inference benchmark, Layerweaver+ significantly improves the system throughput by up to 266.7% over the baseline scheduler serving one DNN at a time.

Year	DOI	Venue
2022	10.1587/transinf.2021EDL8084	IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS
Keywords	DocType	Volume
inference serving system, neural networks, multi-tasking	Journal	E105D
Issue	ISSN	Citations
2	1745-1361	0
PageRank	References	Authors
0.34	0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Young H. Oh	1	0	0.34
Yunho Jin	2	1	1.71
Tae Jun Ham	3	4	3.76
Jae W. Lee	4	0	0.34

1