CODA: Improving Resource Utilization by Slimming and Co-locating DNN and CPU Jobs - Citegraph

Paper Info

Title
CODA: Improving Resource Utilization by Slimming and Co-locating DNN and CPU Jobs

Abstract
While deep neural network (DNN) models are often trained on GPUs, many companies and research institutes build GPU clusters that are shared by different groups. On such GPU cluster, DNN training jobs also require CPU cores to run pre-processing, gradient synchronization. Our investigation shows that the number of cores allocated to a training job significantly impact its performance. To this end, we characterize representative deep learning models on their requirement for CPU cores under different GPU resource configurations, and study the sensitivity of these models to other CPU-side shared resources. Based on the characterization, we propose CODA, a scheduling system that is comprised of an adaptive CPU allocator, a real-time contention eliminator, and a multi-array job scheduler. Experimental results show that CODA improves GPU utilization by 20.8% on average without increasing the queuing time of CPU jobs.

Year	DOI	Venue
2020	10.1109/ICDCS47774.2020.00069	2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)
Keywords	DocType	ISSN
DNN training,CPU demand,resource utilization	Conference	1063-6927
ISBN	Citations	PageRank
978-1-7281-7003-9	1	0.35
References	Authors
0	8

Authors (8 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Han Zhao	1	8	1.81
Weihao Cui	2	13	3.27
Quan Chen	3	175	21.86
Jingwen Leng	4	49	12.97
Kai Yu	5	25	4.47
Deze Zeng	6	49	8.68
Chao Li	7	344	37.85
Minyi Guo	8	3969	332.25

1