Title
The Case for Evaluating MapReduce Performance Using Workload Suites
Abstract
MapReduce systems face enormous challenges due to increasing growth, diversity, and consolidation of the data and computation involved. Provisioning, configuring, and managing large-scale MapReduce clusters require realistic, workload-specific performance insights that existing MapReduce benchmarks are ill-equipped to supply. In this paper, we build the case for going beyond benchmarks for MapReduce performance evaluations. We analyze and compare two production MapReduce traces to develop a vocabulary for describing MapReduce workloads. We show that existing benchmarks fail to capture rich workload characteristics observed in traces, and propose a framework to synthesize and execute representative workloads. We demonstrate that performance evaluations using realistic workloads gives cluster operator new ways to identify workload-specific resource bottlenecks, and workload-specific choice of MapReduce task schedulers. We expect that once available, workload suites would allow cluster operators to accomplish previously challenging tasks beyond what we can now imagine, thus serving as a useful tool to help design and manage MapReduce systems.
Year
DOI
Venue
2011
10.1109/MASCOTS.2011.12
MASCOTS
Keywords
Field
DocType
processor scheduling,parallel processing,mapreduce,production mapreduce,workload,pattern clustering,mapreduce task scheduler,large-scale mapreduce cluster,mapreduce performance evaluation,realistic workloads,workload suites,workload-specific performance,mapreduce workloads,mapreduce benchmarks,large-scale mapreduce cluster management,resource allocation,mapreduce task schedulers,workload-specific resource bottlenecks,mapreduce performance,representative workloads,data handling,performance evaluation,performance,mapreduce workload,mapreduce benchmark,benchmark,mapreduce system,measurement,benchmark testing,production
Computer science,Workload,Parallel processing,Provisioning,Real-time computing,Resource allocation,Operator (computer programming),Group method of data handling,Vocabulary,Benchmark (computing),Distributed computing
Conference
ISSN
ISBN
Citations 
1526-7539
978-1-4577-0468-0
219
PageRank 
References 
Authors
7.80
10
4
Search Limit
100219
Name
Order
Citations
PageRank
Yanpei Chen191741.46
Archana Ganapathi286054.96
Rean Griffith3218599.68
Randy H. Katz4168193018.89