A Power-Aware Symbiotic Scheduling Algorithm for Concurrent GPU Kernels - Citegraph

Paper Info

Title
A Power-Aware Symbiotic Scheduling Algorithm for Concurrent GPU Kernels

Abstract
The past several years have witnessed significant performance improvements in High-Performance Computing (HPC), due to the incorporation of GPUs as co-processors. On one hand, GPU devices are growing significantly in terms of the available number of cores and the memory hierarchy; as a result, effective utilization of the available GPU resources while limiting the system power consumption has become an issue of rising importance. On the other hand, GPU vendors are providing additional supporting features to make this easier, such as enabling concurrent execution of multiple kernels, and providing on-board power sensors that can accessed through software. Amidst these new developments, we are faced with new opportunities for efficiently scheduling GPU computational kernels under performance and power constraints. In this paper, we propose a power-aware scheduling technique that carries out both performance and power optimizations for concurrent GPU kernels. We have observed that for GPU kernels that are deployed for concurrent execution, the order in which the programmer specifies their invocation can significantly alter the execution time and the power draw. We attribute this behavior to the relative synergy (or lack thereof) among kernels that are launched within close proximity of each other. Accordingly, we define performance metrics for computing the extent to which kernels are symbiotic, as well as power metrics for reducing the overall power consumption. Both metrics are estimated by modeling the kernels' complementary resource requirements and execution characteristics. We then propose a power-aware symbiotic scheduling algorithm to obtain a concurrent kernel launch schedule with improved performance and reduced power consumption. Experimental studies are conducted on the Cray XK7 supercomputer with an NVIDIA K20 GPU in each node. The results demonstrate the efficacy of the proposed algorithm-based approach, which can be readily adopted by programmers with minimal programming effort and risk.

Year	DOI	Venue
2015	10.1109/ICPADS.2015.76	International Conference on Parallel and Distributed Systems
Keywords	Field	DocType
GPU, CUDA, Concurrent Kernel Execution, Performance, Power, Scheduling, Algorithm, High-Performance Computing	Kernel (linear algebra),Memory hierarchy,Programmer,Supercomputer,Computer science,Scheduling (computing),CUDA,Parallel computing,Real-time computing,Software,Cray XK7,Distributed computing	Conference
ISSN	Citations	PageRank
1521-9097	5	0.46
References	Authors
8	3

Authors (3 rows)

Cited by (5 rows)

References (8 rows)

Name	Order	Citations	PageRank
Teng Li	1	53	5.40
Vikram K. Narayana	2	102	13.18
tarek elghazawi	3	697	84.30

1