Title
On exploiting data locality for iterative mapreduce applications in hybrid clouds.
Abstract
Hybrid cloud bursting (i.e., leasing temporary off-premise cloud resources to boost the capacity during peak utilization), has made significant impact especially for big data analytics, where the explosion of data sizes and increasingly complex computations frequently leads to insufficient local data center capacity. Cloud bursting however introduces a major challenge to runtime systems due to the limited throughput and high latency of data transfers between on-premise and off-premise resources (weak link). This issue and how to address it is not well understood. We contribute with a comprehensive study on what challenges arise in this context, what potential strategies can be applied to address them and what best practices can be leveraged in real-life. Specifically, we focus our study on iterative MapReduce applications, which are a class of large-scale data intensive applications particularly popular on hybrid clouds. In this context, we study how data locality can be leveraged over the weak link both from the storage layer perspective (when and how to move it off-premise) and from the scheduling perspective (when to compute off-premise). We conclude with a brief discussion on how to set up an experimental framework suitable to study the effectiveness of our proposal in future work.
Year
DOI
Venue
2016
10.1145/3006299.3006329
BDCAT
Keywords
Field
DocType
Hybrid Cloud, Big Data Analytics, Data locality, I/O and Data Management, Scheduling
Data mining,Locality,Data transmission,Computer science,Scheduling (computing),Throughput,Big data,Data center,Cloud computing,Computation,Distributed computing
Conference
ISBN
Citations 
PageRank 
978-1-5090-4468-9
2
0.39
References 
Authors
3
5
Name
Order
Citations
PageRank
Francisco J. Clemente-Castelló1202.68
Bogdan Nicolae239229.51
Rafael Mayo376276.75
Juan Carlos Fernández4729.77
M. Mustafa Rafique515715.49