Title | ||
---|---|---|
Spread-n-share: improving application performance and cluster throughput with resource-aware job placement |
Abstract | ||
---|---|---|
Traditional batch job schedulers adopt the Compact-n-Exclusive (CE) strategy, packing processes of a parallel job into as few compute nodes as possible. While CE minimizes inter-node network communication, it often brings self-contention among tasks of a resource-intensive application. Recent studies have used virtual containers to balance CPU utilization and memory capacity across physical nodes, but the imbalance in cache and memory bandwidth usage is still under-investigated.
In this work, we propose Spread-n-Share (SNS): a new batch scheduling strategy that automatically scales resource-bound applications out onto more nodes to alleviate their performance bottleneck, and co-locate jobs in a resource compatible manner. We implement Uberun, a prototype scheduler to validate SNS, considering shared-cache capacity and memory bandwidth as two types of performance-critical shared resources. Experimental results using 12 diverse cluster workloads show that SNS improves the overall system throughput by 19.8% on average over CE, while achieving an average individual job speedup of 1.8%.
|
Year | DOI | Venue |
---|---|---|
2019 | 10.1145/3295500.3356152 | Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis |
Field | DocType | ISBN |
Bottleneck,Memory bandwidth,Cache,CPU time,Computer science,Batch processing,Job scheduler,Throughput,Distributed computing,Speedup | Conference | 978-1-4503-6229-0 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xiongchao Tang | 1 | 56 | 6.06 |
Haojie Wang | 2 | 2 | 3.75 |
Xiaosong Ma | 3 | 1117 | 68.36 |
Nosayba El-Sayed | 4 | 133 | 9.64 |
Jidong Zhai | 5 | 340 | 36.27 |
Wenguang Chen | 6 | 1014 | 70.57 |
Ashraf Aboulnaga | 7 | 1289 | 91.33 |