Title
Exploration of GPU sharing policies under GEMM workloads
Abstract
Lately, cloud computing has seen explosive growth, due to the flexibility and scalability it offers. The ever-increasing computational demands, especially from the machine learning domain, have forced cloud operators to enhance their infrastructure with acceleration devices, such as General-Purpose (GP)GPUs or FPGAs. Even though multi-tenancy has been widely examined for conventional CPUs, this is not the case for accelerators. Current solutions support "one accelerator per user" schemes, which can lead to both under-utilization and starvation of available resources. In this work, we analyze the potentials of GPU sharing inside data-center environments. We investigate how several architectural features affect the performance of GPUs under different multi-tenant stressing scenarios. We compare CUDA MPS with the native, default CUDA scheduler and also with Vinetalk, a research framework providing GPU sharing capabilities. Experimental results show that NVIDIA's MPS achieves the best performance in multi-application scenarios, specifically up to X4.5 and X11.2 compared to native CUDA scheduler and Vinetalk respectively.
Year
DOI
Venue
2020
10.1145/3378678.3391887
SCOPES '20: 23rd International Workshop on Software and Compilers for Embedded Systems St. Goar Germany May, 2020
Keywords
DocType
ISBN
GPGPU sharing, native CUDA queues, CUDA MPS, Vinetalk, interference analysis, cloud computing
Conference
978-1-4503-7131-5
Citations 
PageRank 
References 
1
0.37
0
Authors
5
Name
Order
Citations
PageRank
Ioannis Oroutzoglou130.75
Dimosthenis Masouros2126.37
Konstantina Koliogeorgi312.06
Sotirios Xydis414431.51
Dimitrios Soudris524348.41