Title
Fast Big Data Analysis in Geo-Distributed Cloud
Abstract
As cloud services grow to span more and more globally distributed datacenters, there is an increasingly need for scheduling algorithms to automatically place tasks across these datacenters. In geo-distributed cloud, the limited WAN bandwidth has become the major bottleneck in fast big data analytics. The scheduling algorithm needs to minimize the global completion time, by jointly optimizing task scheduling and WAN data transfer. In this paper, we model the task scheduling as a community detection problem, with respect to the dependency relations between task, data, and datacenters, and propose a Community Detection-based Scheduling (CDS) algorithm, which is able to minimize the WAN data transfer volume. We utilize the real China-Astronomy-Cloud network to evaluate the proposed algorithms. Experimental results show that we can reduce the total data transfer volume by up to 40.7%, and the global completion time by up to 35.8%, compared with the Hypergraph Partition-based scheduling algorithm and the greedy scheduling algorithm.
Year
DOI
Venue
2016
10.1109/CLUSTER.2016.28
2016 IEEE International Conference on Cluster Computing (CLUSTER)
Keywords
Field
DocType
geodistributed cloud,Big Data analysis,cloud services,globally distributed datacenters,WAN bandwidth,task scheduling,community detection,community detection-based scheduling,WAN data transfer volume,China-astronomy-cloud network,global completion time
Lottery scheduling,Fixed-priority pre-emptive scheduling,Fair-share scheduling,Computer science,Parallel computing,Two-level scheduling,Real-time computing,Rate-monotonic scheduling,Earliest deadline first scheduling,Dynamic priority scheduling,Round-robin scheduling,Distributed computing
Conference
ISSN
ISBN
Citations 
1552-5244
978-1-5090-3654-7
0
PageRank 
References 
Authors
0.34
6
4
Name
Order
Citations
PageRank
Yue Li1610.29
Zhao, L.2144.00
Chenzhou Cui3155.24
Yu, C.4121.90