Title
Performance Tuning of MapReduce Jobs Using Surrogate-based Modeling
Abstract
Modeling workflow performance is crucial for finding optimal configuration parameters and optimizing execution times. We apply the method of surrogate-based modeling to performance tuning of MapReduce jobs. We build a surrogate model defined by a multivariate polynomial containing a variable for each parameter to be tuned. For illustrative purposes, we focus on just two parameters: the number of parallel mappers and the number of parallel reducers. We demonstrate that an accurate performance model can be built sampling a small set of the parameter space. We compare the accuracy and cost of building the model when using different sampling methods as well as when using different modeling approaches. We conclude that the surrogate-based approach we describe is both less expensive in terms of sampling time and more accurate than other well-known tuning methods.
Year
DOI
Venue
2015
10.1016/j.procs.2015.05.193
Procedia Computer Science
Keywords
Field
DocType
Polynomial surface,k-fold cross validation,Parameter tuning,Sampling methods
Data mining,Computer science,Surrogate model,Artificial intelligence,Parameter space,Performance model,Performance tuning,Small set,Workflow,Mathematical optimization,Sampling (statistics),Multivariate polynomials,Machine learning
Conference
Volume
Issue
ISSN
51
C
1877-0509
Citations 
PageRank 
References 
3
0.48
6
Authors
4
Name
Order
Citations
PageRank
Travis Johnston1111.97
Mohammad Alsulmi241.50
Pietro Cicotti310114.52
michela taufer435253.04