Title
SmartJoin: a network-aware multiway join for MapReduce
Abstract
MapReduce is an effective tool for processing large amounts of data in parallel using a cluster of processors or computers. One common data processing task is the join operation, which combines two or more datasets based on values common to each. In this paper, we present a network aware multi-way join for MapReduce (SmartJoin) that improves performance and considers network traffic when redistributing workload amongst reducers. SmartJoin achieves this by dynamically redistributing tuples directly between reducers with an intelligent network aware algorithm. We show that our presented technique has significant potential to minimize the time required to join multiple datasets. In our evaluation, we show that SmartJoin has up to 39 % improvement compared to the non-redistribution method, a 26.8 % improvement over random redistribution and 27.6 % improvement over worst join redistribution.
Year
DOI
Venue
2014
10.1007/s10586-014-0348-1
Cluster Computing
Keywords
Field
DocType
MapReduce,Hadoop,Multiway join,Workload redistribution
Data processing,Tuple,Computer science,Workload,Network aware,Intelligent Network,Distributed computing
Journal
Volume
Issue
ISSN
17
3
1386-7857
Citations 
PageRank 
References 
7
0.43
26
Authors
4
Name
Order
Citations
PageRank
Kenn Slagter1584.00
Ching-Hsien Hsu21121125.53
Yeh-Ching Chung398397.16
Gangman Yi46211.68