Title
Morpheus: Towards Automated SLOs for Enterprise Clusters.
Abstract
Modern resource management frameworks for large-scale analytics leave unresolved the problematic tension between high cluster utilization and job's performance predictability--respectively coveted by operators and users. We address this in Morpheus, a new system that: 1) codifies implicit user expectations as explicit Service Level Objectives (SLOs), inferred from historical data, 2) enforces SLOs using novel scheduling techniques that isolate jobs from sharing-induced performance variability, and 3) mitigates inherent performance variance (e.g., due to failures) by means of dynamic reprovisioning of jobs. We validate these ideas against production traces from a 50k node cluster, and show that Morpheus can lower the number of deadline violations by 5× to 13×, while retaining cluster-utilization, and lowering cluster footprint by 14% to 28%. We demonstrate the scalability and practicality of our implementation by deploying Morpheus on a 2700-node cluster and running it against production-derived workloads.
Year
Venue
Field
2016
OSDI
Resource management,Service level objective,User expectations,Computer science,Scheduling (computing),Real-time computing,Operator (computer programming),Footprint,Analytics,Operating system,Scalability,Distributed computing
DocType
Citations 
PageRank 
Conference
21
0.72
References 
Authors
24
11
Name
Order
Citations
PageRank
Sangeetha Abdu Jyothi1485.74
Carlo Curino2201290.35
Ishai Menache3102252.56
Shravan Matthur Narayanamurthy4281.55
Alexey Tumanov555424.61
Jonathan Yaniv61004.74
Ruslan Mavlyutov7303.19
Iñigo Goiri8103949.27
Subru Krishnan9796.36
Janardhan Kulkarni1015317.73
Sriram Rao1144023.78