Title
H-WorD: Supporting Job Scheduling in Hadoop with Workload-Driven Data Redistribution.
Abstract
Today’s distributed data processing systems typically follow a query shipping approach and exploit data locality for reducing network traffic. In such systems the distribution of data over the cluster resources plays a significant role, and when skewed, it can harm the performance of executing applications. In this paper, we address the challenges of automatically adapting the distribution of data in a cluster to the workload imposed by the input applications. We propose a generic algorithm, named H-WorD, which, based on the estimated workload over resources, suggests alternative execution scenarios of tasks, and hence identifies required transfers of input data a priori, for timely bringing data close to the execution. We exemplify our algorithm in the context of MapReduce jobs in a Hadoop ecosystem. Finally, we evaluate our approach and demonstrate the performance gains of automatic data redistribution.
Year
Venue
Field
2016
ADBIS
Information system,Locality,Scheduling (computing),Workload,Computer science,Data processing system,Exploit,Job scheduler,Group method of data handling,Database,Distributed computing
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
7
4
Name
Order
Citations
PageRank
Petar Jovanovic1627.78
Oscar Romero246735.46
Toon Calders3133393.66
Alberto Abelló484861.88