Title
GeoClone: Online Task Replication and Scheduling for Geo-Distributed Analytics under Uncertainties
Abstract
The execution and completion of analytics jobs can be significantly inflated by the slowest tasks contained. Despite task replication is well-adopted to reduce such straggler latency, existing replication strategies are unsuitable for geo-distributed analytics environments that are highly dynamic, uncertain, and heterogeneous. In this paper, we firstly model the task replication and scheduling problem over time, capturing the geo-analytics features. Afterwards, we design an online algorithm, GeoClone, to select tasks to replicate and select sites to execute the task replicas in an irrevocably online manner, through jointly considering the execution progress of each job and the resource performance in each site. We rigorously prove the competitive ratio to exhibit the theoretical performance guarantee of GeoClone, compared against the offline optimal algorithm which knows all the inputs at once beforehand. Finally, we implement GeoClone with Spark and Yarn for experiments and also conduct extensive large-scale simulations, which confirms GeoClone's practical superiority over multiple state-of-the-art replication strategies.
Year
DOI
Venue
2020
10.1109/IWQoS49365.2020.9212862
2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS)
DocType
ISBN
Citations 
Conference
978-1-7281-6887-6
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Tiantian Wang100.34
Zhuzhong Qian238051.27
Lei Jiao300.68
Xin Li400.68
Sanglu Lu51380144.07