Abstract | ||
---|---|---|
Modern resource management frameworks for large-scale analytics leave unresolved the problematic tension between high cluster utilization and job's performance predictability--respectively coveted by operators and users. We address this in Morpheus, a new system that: 1) codifies implicit user expectations as explicit Service Level Objectives (SLOs), inferred from historical data, 2) enforces SLOs using novel scheduling techniques that isolate jobs from sharing-induced performance variability, and 3) mitigates inherent performance variance (e.g., due to failures) by means of dynamic reprovisioning of jobs. We validate these ideas against production traces from a 50k node cluster, and show that Morpheus can lower the number of deadline violations by 5× to 13×, while retaining cluster-utilization, and lowering cluster footprint by 14% to 28%. We demonstrate the scalability and practicality of our implementation by deploying Morpheus on a 2700-node cluster and running it against production-derived workloads. |
Year | Venue | Field |
---|---|---|
2016 | OSDI | Resource management,Service level objective,User expectations,Computer science,Scheduling (computing),Real-time computing,Operator (computer programming),Footprint,Analytics,Operating system,Scalability,Distributed computing |
DocType | Citations | PageRank |
Conference | 21 | 0.72 |
References | Authors | |
24 | 11 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sangeetha Abdu Jyothi | 1 | 48 | 5.74 |
Carlo Curino | 2 | 2012 | 90.35 |
Ishai Menache | 3 | 1022 | 52.56 |
Shravan Matthur Narayanamurthy | 4 | 28 | 1.55 |
Alexey Tumanov | 5 | 554 | 24.61 |
Jonathan Yaniv | 6 | 100 | 4.74 |
Ruslan Mavlyutov | 7 | 30 | 3.19 |
Iñigo Goiri | 8 | 1039 | 49.27 |
Subru Krishnan | 9 | 79 | 6.36 |
Janardhan Kulkarni | 10 | 153 | 17.73 |
Sriram Rao | 11 | 440 | 23.78 |