Title | ||
---|---|---|
TailCon: Power-Minimizing Tail Percentile Control of Response Time in Server Clusters |
Abstract | ||
---|---|---|
To provide satisfactory customer experience, modern server clusters like Amazon usually set Service Level Agreement (SLA) as guaranteeing a certain percentile (i.e. 99%) of the customer requests to have a response time within a threshold (i.e. 1s). One way to meet the SLA constraint is to serve the customer requests with sufficient computing capacity based on the worst case workload estimation in the server cluster. However, this may cause unnecessary power consumption in the server cluster due to over-provision of the computing capacity especially when the workload is highly dynamic. In this paper, we propose an adaptive computing capacity allocation scheme referred to as TailCon. TailCon aims at minimizing the power consumption in the server cluster while satisfying the SLA constraint by adjusting the number of active servers and the CPU frequencies of the turn on machines online. In TailCon, we analyze the distribution of the request response time dynamically and leverage the measured request response time to estimate the workload intensity in the server cluster, which is used as a continuous feedback to find the proper provision of the computing capacity online based on optimization techniques. We conduct both the emulation using the real-word HTTP traces and the experiments to evaluate the performance of TailCon. The experimental results demonstrate the effectiveness of TailCon scheme in enforcing the SLA constraint while saving the power consumption. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1109/SRDS.2012.72 | SRDS |
Keywords | Field | DocType |
sla constraint,optimisation,response time,tailcon,active server,amazon,customer request,server cluster,power consumption,request response time,power-minimizing tail percentile control,computing capacity,tailcon scheme,workload estimation,modern server cluster,transport protocols,satisfactory customer experience,internet,adaptive computing capacity allocation,service level agreement,workload intensity,workstation clusters,server clusters,continuous feedback,real-word http traces,computing capacity online,cpu frequency,optimization technique | Server farm,Computer science,Workload,Service-level agreement,Server,Computer network,Response time,Real-time computing,Emulation,Computer cluster,Request–response,Distributed computing | Conference |
ISSN | ISBN | Citations |
1060-9857 | 978-1-4673-2397-0 | 8 |
PageRank | References | Authors |
0.49 | 14 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xi Chen | 1 | 333 | 70.76 |
Xue Liu | 2 | 3058 | 193.41 |
Shengquan Wang | 3 | 503 | 31.63 |
Xiao-Wen Chang | 4 | 208 | 24.85 |