Abstract | ||
---|---|---|
Production data analytic workloads typically consist of a majority of jobs with small input data sizes and a small number of jobs with large input data sizes. Recent works advocate scale-up/scale-out heterogeneous clusters (in short Hybrid clusters) to handle these heterogeneous workloads, since scaleup machines (i.e., adding more resources to a single machine) can process small jobs faster than simply scaling out the cluster with cheap machines. However, there are several challenges for job placement and data placement to implement such a Hybrid cluster. In this paper, we propose a job placement strategy and a data placement strategy to solve the challenges. The job placement strategy places a job to either scale-up or scale-out machines based on the job's characteristics, and migrates jobs from scale-up machines to under-utilized scale-out machines to achieve load balance. The data placement strategy allocates data replicas in the two types of machines accordingly to increase the data locality in Hybrid cluster. We implemented a Hybrid cluster on Apache YARN, and evaluated its performance using a Facebook production workload. With our proposed strategies, a Hybrid cluster can reduce the makespan of the workload up to 37% and the median job completion time up to 60%, compared to traditional scale-out clusters with state-of-the-art schedulers. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ICCCN.2019.8847060 | 2019 28th International Conference on Computer Communication and Networks (ICCCN) |
Keywords | Field | DocType |
single machine,cheap machines,job placement strategy,data placement strategy,scale-out machines,data replicas,data locality,Facebook production workload,median job completion time,traditional scale-out clusters,big data analytics,heterogeneous clusters,production data analytic workloads,heterogeneous workloads,scaleup machines,short hybrid clusters,hybrid cluster,Apache YARN | Small number,Cluster (physics),Locality,Job shop scheduling,Yarn,Workload,Computer science,Load balancing (computing),Big data,Distributed computing | Conference |
ISSN | ISBN | Citations |
1095-2055 | 978-1-7281-1857-4 | 0 |
PageRank | References | Authors |
0.34 | 9 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhuozhao Li | 1 | 69 | 11.61 |
Haiying Shen | 2 | 1355 | 126.34 |
Lee Ward | 3 | 49 | 6.70 |