Title
Fast. Efficient Performance Predictions for Big Data Applications
Abstract
In recent years we observe a rapid growth in the deployment of machine learning workloads on big data analytics frameworks like Apache Spark and Apache Flink. These workloads are typically represented as graphs, run on shared infrastructures and often have much more demanding resource requirements than those traditionally found in typical enterprise settings. However, predicting the execution times of the workloads is important as they often run on shared public or private infrastructures and, thus, their execution is greatly affected by the resource sharing, the hardware infrastructure utilized as well as the choice of the configuration parameters provided by the frameworks. In this work, we propose a fast and efficient performance prediction system to address the challenge of predicting the execution times of big data workloads, exploiting the fact that workloads are represented as processing graphs and often share similar structures and parameters. Thus, we can use the performance models we have built for already deployed workloads, to estimate the end-to-end execution time for a new workload. Previous works assume that a large number of profiling runs can be utilized for building the prediction models. However, this assumption is not always valid and more elaborate mechanisms need to be applied. Our detailed experimental evaluation on our local Spark cluster illustrates that our approach can predict accurately the execution time of a wide range of Spark workloads.
Year
DOI
Venue
2019
10.1109/ISORC.2019.00034
2019 IEEE 22nd International Symposium on Real-Time Distributed Computing (ISORC)
Keywords
DocType
ISSN
Graph Similarity,Predictions,Distributed Systems
Conference
1555-0885
ISBN
Citations 
PageRank 
978-1-7281-0152-1
0
0.34
References 
Authors
20
4
Name
Order
Citations
PageRank
Stathis Maroulis152.07
Nikos Zacheilas2799.40
Thanasis Theocharis300.34
Vana Kalogeraki41686124.40