TailCon: Power-Minimizing Tail Percentile Control of Response Time in Server Clusters - Citegraph

Paper Info

Title
TailCon: Power-Minimizing Tail Percentile Control of Response Time in Server Clusters

Abstract
To provide satisfactory customer experience, modern server clusters like Amazon usually set Service Level Agreement (SLA) as guaranteeing a certain percentile (i.e. 99%) of the customer requests to have a response time within a threshold (i.e. 1s). One way to meet the SLA constraint is to serve the customer requests with sufficient computing capacity based on the worst case workload estimation in the server cluster. However, this may cause unnecessary power consumption in the server cluster due to over-provision of the computing capacity especially when the workload is highly dynamic. In this paper, we propose an adaptive computing capacity allocation scheme referred to as TailCon. TailCon aims at minimizing the power consumption in the server cluster while satisfying the SLA constraint by adjusting the number of active servers and the CPU frequencies of the turn on machines online. In TailCon, we analyze the distribution of the request response time dynamically and leverage the measured request response time to estimate the workload intensity in the server cluster, which is used as a continuous feedback to find the proper provision of the computing capacity online based on optimization techniques. We conduct both the emulation using the real-word HTTP traces and the experiments to evaluate the performance of TailCon. The experimental results demonstrate the effectiveness of TailCon scheme in enforcing the SLA constraint while saving the power consumption.

Year	DOI	Venue
2012	10.1109/SRDS.2012.72	SRDS
Keywords	Field	DocType
sla constraint,optimisation,response time,tailcon,active server,amazon,customer request,server cluster,power consumption,request response time,power-minimizing tail percentile control,computing capacity,tailcon scheme,workload estimation,modern server cluster,transport protocols,satisfactory customer experience,internet,adaptive computing capacity allocation,service level agreement,workload intensity,workstation clusters,server clusters,continuous feedback,real-word http traces,computing capacity online,cpu frequency,optimization technique	Server farm,Computer science,Workload,Service-level agreement,Server,Computer network,Response time,Real-time computing,Emulation,Computer cluster,Request–response,Distributed computing	Conference
ISSN	ISBN	Citations
1060-9857	978-1-4673-2397-0	8
PageRank	References	Authors
0.49	14	4

Authors (4 rows)

Cited by (8 rows)

References (14 rows)

Name	Order	Citations	PageRank
Xi Chen	1	333	70.76
Xue Liu	2	3058	193.41
Shengquan Wang	3	503	31.63
Xiao-Wen Chang	4	208	24.85

1