Title
SMiPE: Estimating the Progress of Recurring Iterative Distributed Dataflows
Abstract
Distributed dataflow systems such as Apache Spark allow the execution of iterative programs at large scale on clusters. In production use, programs are often recurring and have strict latency requirements. Yet, choosing appropriate resource allocations is difficult as runtimes are dependent on hard-to-predict factors, including failures, cluster utilization and dataset characteristics. Offline runtime prediction helps to estimate resource requirements, but cannot take into account inherent variance due to, for example, changing cluster states. We present SMiPE, a system estimating the progress of iterative dataflows by matching a running job to previous executions based on similarity, capturing properties such as convergence, hardware utilization and runtime. SMiPE is not limited to a specific framework due to its black-box approach and is able to adapt to changing cluster states reflected in the current job's statistics. SMiPE automatically adapts its similarity matching to algorithm-specific profiles by training parameters on the job history. We evaluated SMiPE with three iterative Spark jobs and nine datasets. The results show that SMiPE is effective in choosing useful historic runs and predicts runtimes with a mean relative error of 9.1% to 13.1%.
Year
DOI
Venue
2017
10.1109/PDCAT.2017.00034
2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)
Keywords
Field
DocType
Scalable Data Analysis,Distributed Dataflows,Runtime Prediction,Progress Estimation,Iterative Algorithms
Resource management,Convergence (routing),Spark (mathematics),Task analysis,Computer science,Iterative method,Dataflow,Resource allocation,Approximation error,Distributed computing
Conference
ISBN
Citations 
PageRank 
978-1-5386-3152-2
0
0.34
References 
Authors
12
4
Name
Order
Citations
PageRank
Jannis Koch100.34
Lauritz Thamsen2439.26
Florian Schmidt326834.52
Odej Kao4106696.19