Title
Evaluation of Data Locality Strategies for Hybrid Cloud Bursting of Iterative MapReduce.
Abstract
Hybrid cloud bursting (i.e., leasing temporary off-premise cloud resources to boost the overall capacity during peak utilization) is a popular and cost-effective way to deal with the increasing complexity of big data analytics. It is particularly promising for iterative MapReduce applications that reuse massive amounts of input data at each iteration, which compensates for the high overhead and cost of concurrent data transfers from the on-premise to the off-premise VMs over a weak inter-site link that is of limited capacity. In this paper we study how to combine various MapReduce data locality techniques designed for hybrid cloud bursting in order to achieve scalability for iterative MapReduce applications in a cost-effective fashion. This is a non-trivial problem due to the complex interaction between the data movements over the weak link and the scheduling of computational tasks that have to adapt to the shifting data distribution. We show that using the right combination of techniques, iterative MapReduce applications can scale well in a hybrid cloud bursting scenario and come even close to the scalability observed in single sites.
Year
DOI
Venue
2017
10.1109/CCGRID.2017.96
CCGrid
Keywords
Field
DocType
Hybrid Cloud, Big Data Analytics, Data locality, I/O and Data Management, Scheduling
Locality,Data transmission,Reuse,Computer science,Scheduling (computing),Interference (wave propagation),Big data,Scalability,Cloud computing,Distributed computing
Conference
ISSN
ISBN
Citations 
2376-4414
978-1-5090-5980-5
4
PageRank 
References 
Authors
0.39
14
5
Name
Order
Citations
PageRank
Francisco J. Clemente-Castelló1202.68
Bogdan Nicolae239229.51
M. Mustafa Rafique315715.49
Rafael Mayo476276.75
Juan Carlos Fernández5729.77