The Case for Evaluating MapReduce Performance Using Workload Suites - Citegraph

Paper Info

Title
The Case for Evaluating MapReduce Performance Using Workload Suites

Abstract
MapReduce systems face enormous challenges due to increasing growth, diversity, and consolidation of the data and computation involved. Provisioning, configuring, and managing large-scale MapReduce clusters require realistic, workload-specific performance insights that existing MapReduce benchmarks are ill-equipped to supply. In this paper, we build the case for going beyond benchmarks for MapReduce performance evaluations. We analyze and compare two production MapReduce traces to develop a vocabulary for describing MapReduce workloads. We show that existing benchmarks fail to capture rich workload characteristics observed in traces, and propose a framework to synthesize and execute representative workloads. We demonstrate that performance evaluations using realistic workloads gives cluster operator new ways to identify workload-specific resource bottlenecks, and workload-specific choice of MapReduce task schedulers. We expect that once available, workload suites would allow cluster operators to accomplish previously challenging tasks beyond what we can now imagine, thus serving as a useful tool to help design and manage MapReduce systems.

Year	DOI	Venue
2011	10.1109/MASCOTS.2011.12	MASCOTS
Keywords	Field	DocType
processor scheduling,parallel processing,mapreduce,production mapreduce,workload,pattern clustering,mapreduce task scheduler,large-scale mapreduce cluster,mapreduce performance evaluation,realistic workloads,workload suites,workload-specific performance,mapreduce workloads,mapreduce benchmarks,large-scale mapreduce cluster management,resource allocation,mapreduce task schedulers,workload-specific resource bottlenecks,mapreduce performance,representative workloads,data handling,performance evaluation,performance,mapreduce workload,mapreduce benchmark,benchmark,mapreduce system,measurement,benchmark testing,production	Computer science,Workload,Parallel processing,Provisioning,Real-time computing,Resource allocation,Operator (computer programming),Group method of data handling,Vocabulary,Benchmark (computing),Distributed computing	Conference
ISSN	ISBN	Citations
1526-7539	978-1-4577-0468-0	219
PageRank	References	Authors
7.80	10	4

Search Limit

100219

Authors (4 rows)

Cited by (100 rows)

References (10 rows)

Name	Order	Citations	PageRank
Yanpei Chen	1	917	41.46
Archana Ganapathi	2	860	54.96
Rean Griffith	3	2185	99.68
Randy H. Katz	4	16819	3018.89

1