H-WorD: Supporting Job Scheduling in Hadoop with Workload-Driven Data Redistribution. - Citegraph

Paper Info

Title
H-WorD: Supporting Job Scheduling in Hadoop with Workload-Driven Data Redistribution.

Abstract
Today’s distributed data processing systems typically follow a query shipping approach and exploit data locality for reducing network traffic. In such systems the distribution of data over the cluster resources plays a significant role, and when skewed, it can harm the performance of executing applications. In this paper, we address the challenges of automatically adapting the distribution of data in a cluster to the workload imposed by the input applications. We propose a generic algorithm, named H-WorD, which, based on the estimated workload over resources, suggests alternative execution scenarios of tasks, and hence identifies required transfers of input data a priori, for timely bringing data close to the execution. We exemplify our algorithm in the context of MapReduce jobs in a Hadoop ecosystem. Finally, we evaluate our approach and demonstrate the performance gains of automatic data redistribution.

Year	Venue	Field
2016	ADBIS	Information system,Locality,Scheduling (computing),Workload,Computer science,Data processing system,Exploit,Job scheduler,Group method of data handling,Database,Distributed computing
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
7	4

Authors (4 rows)

Cited by (0 rows)

References (7 rows)

Name	Order	Citations	PageRank
Petar Jovanovic	1	62	7.78
Oscar Romero	2	467	35.46
Toon Calders	3	1333	93.66
Alberto Abelló	4	848	61.88

1