Title
Dione: Profiling spark applications exploiting graph similarity
Abstract
In recent years distributed processing frameworks such as Apache Spark have been utilized for running big data applications. Predicting the application's execution time has been an important goal since it can help the end user to determine the necessary processing resources to be reserved. While there have been some previous works that examine the problem of profiling Spark applications, they mainly focus on specific application types (e.g., Machine learning applications) and rely on the existence of a large number of previous execution runs. In this work we aim at overcoming these limitations by minimizing the number of past execution runs needed for the profiling phase. Furthermore, we identify patterns of continuous identical dataset transformations between different applications to cope with the limited historical data availability. We propose an on-line profiling framework, called Dione, that estimates the running times of new applications, even if no historical data is available. Finally, in our detailed experimental evaluation, using practical workloads on our local cluster, we illustrate that our approach accurately predicts the execution times of Spark applications and requires 30% less training time and monetary cost compared to the current state-of-the-art techniques.
Year
DOI
Venue
2017
10.1109/BigData.2017.8257950
2017 IEEE International Conference on Big Data (Big Data)
Keywords
DocType
ISSN
Profiling,Big Data,Spark,Graph similarity
Conference
2639-1589
ISBN
Citations 
PageRank 
978-1-5386-2716-7
1
0.35
References 
Authors
0
3
Name
Order
Citations
PageRank
Nikos Zacheilas1799.40
Stathis Maroulis252.07
Vana Kalogeraki31686124.40