Title
Phoenix: A Constraint-Aware Scheduler for Heterogeneous Datacenters
Abstract
Today's datacenters are increasingly becoming diverse with respect to both hardware and software architectures in order to support a myriad of applications. These applications are also heterogeneous in terms of job response times and resource requirements (eg., Number of Cores, GPUs, Network Speed) and they are expressed as task constraints. Constraints are used for ensuring task performance guarantees/Quality of Service(QoS) by enabling the application to express its specific resource requirements. While several schedulers have recently been proposed that aim to improve overall application and system performance, few of these schedulers consider resource constraints across tasks while making the scheduling decisions. Furthermore, latencycritical workloads and short-lived jobs that typically constitute about 90% of the total jobs in a datacenter have strict QoS requirements, which can be ensured by minimizing the tail latency through effective scheduling. In this paper, we propose Phoenix, a constraint-aware hybrid scheduler to address both these problems (constraint awareness and ensuring low tail latency) by minimizing the job response times at constrained workers. We use a novel Constraint Resource Vector (CRV) based scheduling, which in turn facilitates reordering of the jobs in a queue to minimize tail latency. We have used the publicly available Google traces to analyze their constraint characteristics and have embedded these constraints in Cloudera and Yahoo cluster traces for studying the impact of traces on system performance. Experiments with Google, Cloudera and Yahoo cluster traces across 15,000 worker node cluster shows that Phoenix improves the 99th percentile job response times on an average by 1.9× across all three traces when compared against a state-of-the-art hybrid scheduler. Further, in comparison to other distributed scheduler like Hawk, it improves the 90 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">th</sup> and 99 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">th</sup> percentile job response times by 4.5× and 5× respectively.
Year
DOI
Venue
2017
10.1109/ICDCS.2017.262
2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)
Keywords
Field
DocType
Scheduling,Hybrid,Heterogeneous Data Center,Constraint-aware,Resource Management,Performance
Latency (engineering),Scheduling (computing),Computer science,Queue,Computer network,Quality of service,Software,Phoenix,Distributed computing,Cloud computing
Conference
ISSN
ISBN
Citations 
1063-6927
978-1-5386-1793-9
6
PageRank 
References 
Authors
0.40
26
5
Name
Order
Citations
PageRank
Prashanth Thinakaran1214.09
Jashwant Raj Gunasekaran2183.96
Bikash Sharma31165.09
Mahmut Taylan Kandemir43811.03
Chita R. Das5146780.03