Title
Predictive Guardbanding: Program-Driven Timing Margin Reduction for GPUs
Abstract
The energy efficiency of GPU architectures has emerged as an essential aspect of computer system design. In this article, we explore the energy benefits of reducing the GPU chip's voltage to the safe limit, i.e., V <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">min</sub> point, using predictive software techniques. We perform such a study on several commercial off-the-shelf GPU cards. We find that there exists about 20% voltage guardband on those GPUs spanning two architectural generations, which, if “eliminated” entirely, can result in up to 25% energy savings on one of the studied GPU cards. Our measurement results unveil a program dependent V <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">min</sub> behavior across the studied applications, and the exact improvement magnitude depends on the program's available guardband. We make fundamental observations about the program-dependent V <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">min</sub> behavior. We experimentally determine that the voltage noise has a more substantial impact on V <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">min</sub> compared to the process and temperature variation, and the activities during the kernel execution cause large voltage droops. From these findings, we show how to use kernels' microarchitectural performance counters to predict its V <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">min</sub> value accurately. The average and maximum prediction errors are 0.5% and 3%, respectively. The accurate V <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">min</sub> prediction opens up new possibilities of a crosslayer dynamic guardbanding scheme for GPUs, in which software predicts and manages the voltage guardband, while the functional correctness is ensured by a hardware safety net mechanism.
Year
DOI
Venue
2021
10.1109/TCAD.2020.2992684
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Keywords
DocType
Volume
GPU,multicore processors,process,voltage,temperature (PVT) variation,single instruction and multiple data,voltage guardband
Journal
40
Issue
ISSN
Citations 
1
0278-0070
1
PageRank 
References 
Authors
0.39
24
6
Name
Order
Citations
PageRank
Jingwen Leng14912.97
Alper Buyuktosunoglu2402.94
Ramon Bertran323117.28
Pradip Bose42790210.58
Yazhou Zu5405.20
Vijay Janapa Reddi62931140.26