Title
TailCon: Power-Minimizing Tail Percentile Control of Response Time in Server Clusters
Abstract
To provide satisfactory customer experience, modern server clusters like Amazon usually set Service Level Agreement (SLA) as guaranteeing a certain percentile (i.e. 99%) of the customer requests to have a response time within a threshold (i.e. 1s). One way to meet the SLA constraint is to serve the customer requests with sufficient computing capacity based on the worst case workload estimation in the server cluster. However, this may cause unnecessary power consumption in the server cluster due to over-provision of the computing capacity especially when the workload is highly dynamic. In this paper, we propose an adaptive computing capacity allocation scheme referred to as TailCon. TailCon aims at minimizing the power consumption in the server cluster while satisfying the SLA constraint by adjusting the number of active servers and the CPU frequencies of the turn on machines online. In TailCon, we analyze the distribution of the request response time dynamically and leverage the measured request response time to estimate the workload intensity in the server cluster, which is used as a continuous feedback to find the proper provision of the computing capacity online based on optimization techniques. We conduct both the emulation using the real-word HTTP traces and the experiments to evaluate the performance of TailCon. The experimental results demonstrate the effectiveness of TailCon scheme in enforcing the SLA constraint while saving the power consumption.
Year
DOI
Venue
2012
10.1109/SRDS.2012.72
SRDS
Keywords
Field
DocType
sla constraint,optimisation,response time,tailcon,active server,amazon,customer request,server cluster,power consumption,request response time,power-minimizing tail percentile control,computing capacity,tailcon scheme,workload estimation,modern server cluster,transport protocols,satisfactory customer experience,internet,adaptive computing capacity allocation,service level agreement,workload intensity,workstation clusters,server clusters,continuous feedback,real-word http traces,computing capacity online,cpu frequency,optimization technique
Server farm,Computer science,Workload,Service-level agreement,Server,Computer network,Response time,Real-time computing,Emulation,Computer cluster,Request–response,Distributed computing
Conference
ISSN
ISBN
Citations 
1060-9857
978-1-4673-2397-0
8
PageRank 
References 
Authors
0.49
14
4
Name
Order
Citations
PageRank
Xi Chen133370.76
Xue Liu23058193.41
Shengquan Wang350331.63
Xiao-Wen Chang420824.85