Title
Machines Tuning Machines: Configuring Distributed Stream Processors with Bayesian Optimization
Abstract
Modern distributed computing frameworks such as Apache Hadoop, Spark, or Storm distribute the workload of applications across a large number of machines. Whilst they abstract the details of distribution they do require the programmer to set a number of configuration parameters before deployment. These parameter settings (usually) have a substantial impact on execution efficiency. Finding the right values for these parameters is considered a difficult task and requires domain, application, and framework expertise. In this paper, we propose a machine learning approach to the problem of configuring a distributed computing framework. Specifically, we propose using Bayesian Optimization to find good parameter settings. In an extensive empirical evaluation, we show that Bayesian Optimization can effectively find good parameter settings for four different stream processing topologies implemented in Apache Storm resulting in significant gains over a parallel linear approach.
Year
DOI
Venue
2015
10.1109/CLUSTER.2015.13
Cluster Computing
Keywords
Field
DocType
distributed stream processing, configuration, optimization, Storm
Software deployment,Programmer,Spark (mathematics),Workload,Computer science,Parallel computing,Bayesian optimization,Parallel processing,Real-time computing,Network topology,Stream processing,Distributed computing
Conference
ISSN
Citations 
PageRank 
1552-5244
4
0.43
References 
Authors
23
3
Name
Order
Citations
PageRank
Lorenz Fischer1150.97
Shen Gao250.79
Abraham Bernstein315613.80