Title
Using Performance Measurements To Improve Mapreduce Algorithms
Abstract
The Hadoop MapReduce software environment is used for parallel processing of distributively stored data. Data mining algorithms of increasing sophistication are being implemented in MapReduce, bringing new challenges for performance measurement and tuning. We focus on analyzing a job after completion, utilizing information collected from Hadoop logs and machine metrics. Our analysis, inspired by [1] [2], goes beyond conventional Hadoop Job-Tracker analysis by integrating more data and providing web browser visualization tools. This paper describes examples where measurements helped diagnose subtle issues and improve algorithm performance. Examples demonstrate the value of correlating detailed information that is not usually examined in standard Hadoop performance displays.
Year
DOI
Venue
2012
10.1016/j.procs.2012.04.210
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2012
Keywords
Field
DocType
software performance, Hadoop, MapReduce, cluster monitoring
Data mining,Web browser,Visualization,Computer science,Parallel processing,Algorithm,Software performance testing,Performance measurement,Software,Data mining algorithm
Journal
Volume
ISSN
Citations 
9
1877-0509
0
PageRank 
References 
Authors
0.34
1
3
Name
Order
Citations
PageRank
Todd D. Plantenga100.34
Yung Ryn Choe2979.17
Ann Yoshimura300.34