Title
A Scalable Priority-Aware Approach to Managing Data Center Server Power
Abstract
Power management is a key component of modern data center design. Power managers must (1) ensure the cost- and energy-efficient utilization of the data center infrastructure, (2) maintain availability of the services provided by the center, and (3) address environmental concerns associated with the center's power consumption. While several power management techniques have been proposed and deployed in production data centers, there are still many challenges to comprehensive data center power management. This is particularly true in public cloud environments, where different jobs have different priority levels, and where high availability is critical.One example of the challenges facing public cloud data centers involves power capping. As power delivery must be highly reliable and tolerate wide variation in the load drawn by the data center components, the power infrastructure (e.g., power supplies, circuit breakers, UPS) has high redundancy and overprovisioning. During normal operation (i.e., typical server power demands, and no failures in the center), the power infrastructure is significantly underutilized. Power capping is a common solution to reduce this underutilization, by allowing more servers to be added safely (i.e., without power shortfalls) to the existing power infrastructure, and throttling power consumption in the infrequent cases where the demanded power exceeds the provisioned power capacity to avoid shortfalls. However, state-of-the-art power capping solutions are (1) not directly applicable to the redundant power infrastructure used in highly-available data centers; and (2) oblivious to differing workload priorities across the entire center when power consumption needs to be throttled, which can unnecessarily slow down high prioritywork.To address this need, we develop CapMaestro, a new power management architecture with three key features for public cloud data centers. First, CapMaestro is designed to work with multiple power feeds (i.e., sources), and exploits server-level power capping to independently cap the load on each feed of a server. Second, CapMaestro uses a scalable, global priority-aware power capping approach, which accounts for power capacity at each level of the power distribution hierarchy. It exploits the underutilization of commonly-employed redundant power infrastructure at each level of the hierarchy to safely accommodate a much greater number of servers. Third, CapMaestro exploits stranded power (i.e., power budgets that are not utilized) in redundant power infrastructure to boost the performance of workloads in the data center. We add CapMaestro to a real cloud data center control plane, and demonstrate the effectiveness of all three key features. Using a large-scale data center simulation, we demonstrate that CapMaestro significantly and safely increases the number of servers for existing infrastructure. We also call out other key technical challenges the industry faces in data center power management.
Year
DOI
Venue
2019
10.1109/HPCA.2019.00067
2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)
Keywords
Field
DocType
Servers,Power demand,Cloud computing,Power system management,Data center power,Feeds
Computer science,Server,Computer network,Real-time computing,Power demand,Data center,Scalability,Cloud computing
Conference
ISSN
ISBN
Citations 
1530-0897
978-1-7281-1444-6
0
PageRank 
References 
Authors
0.34
0
8
Name
Order
Citations
PageRank
Yang Li1659125.00
Charles R. Lefurgy219613.79
Karthick Rajamani356057.11
Malcolm Allen-Ware41075.15
Guillermo J. Silva500.34
Daniel D. Heimsoth600.34
Saugata Ghose771836.45
Onur Mutlu89446357.40