Adaptive Preshuffling In Hadoop Clusters - Citegraph

Paper Info

Title
Adaptive Preshuffling In Hadoop Clusters

Abstract
MapReduce has become an important distributed processing model for large-scale data-intensive applications like data mining and web indexing. Hadoop-an open-source implementation of MapReduce is widely used for short jobs requiring low response time. In this paper, We proposed a new preshuffling strategy in Hadoop to reduce high network loads imposed by shuffle-intensive applications. Designing new shuffling strategies is very appealing for Hadoop clusters where network interconnects are performance bottleneck when the clusters are shared among a large number of applications. The network interconnects are likely to become scarce resource when many shuffle-intensive applications are sharing a Hadoop cluster. We implemented the push model along with the preshuffling scheme in the Hadoop system, where the 2-stage pipeline was incorporated with the preshuffling scheme. We implemented the push model and a pipeline along with the preshuffling scheme in the Hadoop system. Using two Hadoop benchmarks running on the 10-node cluster, we conducted experiments to show that preshuffling-enabled Hadoop clusters are faster than native Hadoop clusters. For example, the push model and the preshuffling scheme powered by the 2-stage pipeline can shorten the execution times of the WordCount and Sort Hadoop applications by an average of 10% and 14%, respectively.

Year	DOI	Venue
2013	10.1016/j.procs.2013.05.422	2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE
Field	DocType	Volume
Web indexing,Cluster (physics),Bottleneck,Computer science,Parallel computing,sort,Response time,Shuffling	Conference	18
ISSN	Citations	PageRank
1877-0509	1	0.35
References	Authors
11	6

Authors (6 rows)

Cited by (1 rows)

References (11 rows)

Name	Order	Citations	PageRank
Jiong Xie	1	161	10.15
Yun Tian	2	150	9.81
shu yin	3	307	22.05
Ji Zhang	4	20	3.75
Xiaojun Ruan	5	390	25.87
Xiao Qin	6	1836	125.69

1