Title
Flex: High-Availability Datacenters With Zero Reserved Power
Abstract
Cloud providers, like Amazon and Microsoft, must guarantee high availability for a large fraction of their workloads. For this reason, they build datacenters with redundant infrastructures for power delivery and cooling. Typically, the redundant resources are reserved for use only during infrastructure failure or maintenance events, so that workload performance and availability do not suffer. Unfortunately, the reserved resources also produce lower power utilization and, consequently, require more datacenters to be built. To address these problems, in this paper we propose "zero-reserved-power" datacenters and the Flex system to ensure that workloads still receive their desired performance and availability. Flex leverages the existence of software-redundant workloads that can tolerate lower infrastructure availability, while imposing minimal (if any) performance degradation for those that require high infrastructure availability. Flex mainly comprises (1) a new offline workload placement policy that reduces stranded power while ensuring safety during failure or maintenance events, and (2) a distributed system that monitors for failures and quickly reduces the power draw while respecting the workloads’ requirements, when it detects a failure. Our evaluation shows that Flex produces less than 5% stranded power and increases the number of deployed servers by up to 33%, which translates to hundreds of millions of dollars in construction cost savings per datacenter site. We end the paper with lessons from our experience bringing Flex to production in Microsoft’s datacenters.
Year
DOI
Venue
2021
10.1109/ISCA52012.2021.00033
2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)
Keywords
DocType
ISSN
Index Terms—Datacenter power management,redundant power,power capping,workload availability
Conference
1063-6897
ISBN
Citations 
PageRank 
978-1-6654-3334-1
0
0.34
References 
Authors
0
15