Title
GPU-Job Migration: The rCUDA Case
Abstract
Virtualization techniques have been shown to report benefits to data centers and other computing facilities. In this regard, not only virtual machines allow to reduce the size of the computing infrastructure while increasing overall resource utilization, but also virtualizing individual components of computers may provide significant benefits. This is the case, for instance, for the remote GPU virtualization technique, implemented in several frameworks during the recent years. The large degree of flexibility provided by the remote GPU virtualization technique can be further increased by applying the migration mechanism to it, so that the GPU part of applications can be live-migrated to another GPU elsewhere in the cluster during execution time in a transparent way. In this paper we present the implementation of the migration mechanism within the rCUDA remote GPU virtualization middleware. Furthermore, we present a thorough performance analysis of the implementation of the migration mechanism within rCUDA. To that end, we leverage both synthetic and real production applications as well as three different generations of NVIDIA GPUs. Additionally, two different versions of the InfiniBand interconnect are used in this study. Several use cases are provided in order to show the extraordinary benefits that the GPU-job migration mechanism can report to data centers.
Year
DOI
Venue
2019
10.1109/TPDS.2019.2924433
IEEE Transactions on Parallel and Distributed Systems
Keywords
Field
DocType
Graphics processing units,Virtualization,Servers,Middleware,Proposals,Virtual machining,Resource management
Virtualization,Resource management,Middleware,Use case,Virtual machine,InfiniBand,Computer science,Server,Interconnection,Distributed computing
Journal
Volume
Issue
ISSN
30
12
1045-9219
Citations 
PageRank 
References 
1
0.35
0
Authors
2
Name
Order
Citations
PageRank
Javier Prades1307.15
Federico Silla257656.77