Title
Hadoop framework: impact of data organization on performance.
Abstract
Hadoop, based on the popular MapReduce framework, is an open-source distributed computing framework that has been gaining much popularity and usage. It aims to allow programmers to focus on building applications that deals with processing large amount of data, without having to handle other issues when performing parallel computations. However, tuning the performance of Hadoop applications is not an easy task due to the level of abstraction of the framework. In this paper, we present three case studies and some of the challenges and issues that are to be considered in performance tuning when running applications in Hadoop. The focus is mainly on the impact of input data on Hadoop's performance and how they can be tuned. Copyright (c) 2011 John Wiley & Sons, Ltd.
Year
DOI
Venue
2013
10.1002/spe.1082
SOFTWARE-PRACTICE & EXPERIENCE
Keywords
DocType
Volume
mapreduce,hadoop,performance tuning,distributed computing
Journal
43
Issue
ISSN
Citations 
SP11
0038-0644
4
PageRank 
References 
Authors
0.39
3
9
Name
Order
Citations
PageRank
Yu Shyang Tan1704.58
Jiaqi Tan241225.57
Eng Siong Chng3970106.33
Bu-Sung Lee42119140.18
Jiaming Li5253.80
Susumu Date613328.14
Hui Ping Chak740.39
Xiong Xiao828134.97
Atsushi Narishige9101.33