Title
GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers.
Abstract
Today, data-intensive applications rely on geographically distributed systems to leverage data collection, storing and processing. Data locality has been seen as a prominent technique to improve application performance and reduce the impact of network latency by scheduling jobs directly in the nodes hosting the data to be processed. MapReduce and Dryad are examples of frameworks which exploit locality by splitting jobs into multiple tasks that are dispatched to process portions of data locally. However, as the ecosystem of big data analysis has shifted from single clusters to span geo-distributed data centers, it is unavoidable that data may still be transferred through the network in order reduce the schedule length. Nevertheless, there is a lack of mechanism to efficiently blend data locality and inter-data center data transfer requirement in the existing scheduling techniques to address data-intensive processing across dispersed data centers. Therefore, the objective of this work is to propose and solve the makespan optimization problem for data-intensive job scheduling on geo-distributed data centers. To this end, we first formulate the task placement and the data access as a linear programming and use the GLPK solver to solve it. We then present a low complexity heuristic scheduling algorithm called GeoDis which allows data locality to cope with the data transfer requirement to achieve a greater performance on the makespan. The experiments with various realistic traces and synthetic generated workload show that GeoDis can reduce makespan of processing jobs by 44% as compared to the state-of-the-art algorithms and remain within \(91\%\) closer to the optimal solution by the LP solver.
Year
DOI
Venue
2018
10.1007/s00607-017-0564-7
Computing
Keywords
Field
DocType
Geo-distributed,Data center,Scheduling,Data locality,Batch jobs,Big data analysis,90C05 Linear programming,90C27 Combinatorial optimization,90C46 Optimality conditions,duality
Locality,Job shop scheduling,Scheduling (computing),Computer science,Parallel computing,Job scheduler,Solver,Data access,Data center,Big data,Distributed computing
Journal
Volume
Issue
ISSN
100
1
0010-485X
Citations 
PageRank 
References 
5
0.42
38
Authors
4
Name
Order
Citations
PageRank
Moïse W. Convolbo150.42
Jerry Chou2238.25
Ching-Hsien Hsu31121125.53
Yeh-Ching Chung498397.16