Title
A Power-Aware Symbiotic Scheduling Algorithm for Concurrent GPU Kernels
Abstract
The past several years have witnessed significant performance improvements in High-Performance Computing (HPC), due to the incorporation of GPUs as co-processors. On one hand, GPU devices are growing significantly in terms of the available number of cores and the memory hierarchy; as a result, effective utilization of the available GPU resources while limiting the system power consumption has become an issue of rising importance. On the other hand, GPU vendors are providing additional supporting features to make this easier, such as enabling concurrent execution of multiple kernels, and providing on-board power sensors that can accessed through software. Amidst these new developments, we are faced with new opportunities for efficiently scheduling GPU computational kernels under performance and power constraints. In this paper, we propose a power-aware scheduling technique that carries out both performance and power optimizations for concurrent GPU kernels. We have observed that for GPU kernels that are deployed for concurrent execution, the order in which the programmer specifies their invocation can significantly alter the execution time and the power draw. We attribute this behavior to the relative synergy (or lack thereof) among kernels that are launched within close proximity of each other. Accordingly, we define performance metrics for computing the extent to which kernels are symbiotic, as well as power metrics for reducing the overall power consumption. Both metrics are estimated by modeling the kernels' complementary resource requirements and execution characteristics. We then propose a power-aware symbiotic scheduling algorithm to obtain a concurrent kernel launch schedule with improved performance and reduced power consumption. Experimental studies are conducted on the Cray XK7 supercomputer with an NVIDIA K20 GPU in each node. The results demonstrate the efficacy of the proposed algorithm-based approach, which can be readily adopted by programmers with minimal programming effort and risk.
Year
DOI
Venue
2015
10.1109/ICPADS.2015.76
International Conference on Parallel and Distributed Systems
Keywords
Field
DocType
GPU, CUDA, Concurrent Kernel Execution, Performance, Power, Scheduling, Algorithm, High-Performance Computing
Kernel (linear algebra),Memory hierarchy,Programmer,Supercomputer,Computer science,Scheduling (computing),CUDA,Parallel computing,Real-time computing,Software,Cray XK7,Distributed computing
Conference
ISSN
Citations 
PageRank 
1521-9097
5
0.46
References 
Authors
8
3
Name
Order
Citations
PageRank
Teng Li1535.40
Vikram K. Narayana210213.18
tarek elghazawi369784.30