Title
Comparing GPU Power and Frequency Capping: A Case Study with the MuMMI Workflow
Abstract
Accomplishing the goal of exascale computing under a potential power limit requires HPC clusters to maximize both parallel efficiency and power efficiency. As modern HPC systems embark on a trend toward extreme heterogeneity leveraging multiple GPUs per node, power management becomes even more challenging, especially when catering to scientific workflows with co-scheduled components. The impact of managing GPU power on workflow performance and run-to-run reproducibility has not been adequately studied. In this paper, we present a first-of-its-kind research to study the impact of the two power management knobs that are available on NVIDIA Volta GPUs: frequency capping and power capping. We analyzed performance and power metrics of GPU's on a top-10 supercomputer by tuning these knobs for more than 5,300 runs in a scientific workflow. Our data found that GPU power capping in a scientific workflow is an effective way of improving power efficiency while preserving performance, while GPU frequency capping is a demonstrably unpredictable way of reducing power consumption. Additionally, we identified that frequency capping results in higher variation and anomalous behavior on GPUs, which is counterintuitive to what has been observed in the research conducted on CPUs.
Year
DOI
Venue
2019
10.1109/WORKS49585.2019.00009
2019 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)
Keywords
Field
DocType
Workflows, Cancer MuMMI, GPU power capping, GPU frequency capping, Performance, Variation
Exascale computing,Power limits,Electrical efficiency,Power management,Supercomputer,Computer science,Workflow,Power consumption,Distributed computing,Anomalous behavior
Conference
ISBN
Citations 
PageRank 
978-1-7281-5998-0
0
0.34
References 
Authors
28
7
Name
Order
Citations
PageRank
Tapasya Patki11378.98
Zachary Frye200.34
Harsh Bhatia3858.99
Francesco Di Natale462.14
James N. Glosli5345.04
Helgi I. Ingólfsson621.05
Barry Rountree7101351.24