Dynamic Data Redistribution for MapReduce Joins - Citegraph

Paper Info

Title
Dynamic Data Redistribution for MapReduce Joins

Abstract
MapReduce has become a popular method for data processing, in particular for large scale datasets, due to its accessibility as a scalable yet convenient programming paradigm. Data processing tasks often involve joins, and the repartition and fragment-replicate joins are two widely-used join algorithms utilised within the MapReduce framework. This paper presents a multi-join supporting tuple redistribution, building on both the repartition and fragment-replicate joins. Hadoop is used to demonstrate how reduce tasks may improve performance by passing intermediate results to other reduce tasks that are better able to process them using Apache ZooKeeper as a means of communication and data transfer. A performance analysis is presented showing the technique has the potential to reduce response times when processing multiple joins in single MapReduce jobs.

Year	DOI	Venue
2011	10.1109/CloudCom.2011.111	Cloud Computing Technology and Science
Keywords	Field	DocType
performance analysis,mapreduce joins,large scale datasets,processing multiple,mapreduce framework,data processing,dynamic data redistribution,popular method,intermediate result,convenient programming paradigm,apache zookeeper,single mapreduce job,dynamic data,programming paradigm,data transfer,data handling,servers,database management,algorithm design,parallel programming,resource description framework,algorithm design and analysis	Joins,Programming paradigm,Tuple,Computer science,Parallel computing,Server,Dynamic data,Group method of data handling,RDF,Distributed computing,Scalability	Conference
ISBN	Citations	PageRank
978-1-4673-0090-2	5	0.48
References	Authors
10	4

Authors (4 rows)

Cited by (5 rows)

References (10 rows)

Name	Order	Citations	PageRank
Steven Lynden	1	19	2.19
Yusuke Tanimura	2	170	16.80
Isao Kojima	3	143	20.38
Akiyoshi Matono	4	79	10.05

1