Dione: Profiling spark applications exploiting graph similarity - Citegraph

Paper Info

Title
Dione: Profiling spark applications exploiting graph similarity

Abstract
In recent years distributed processing frameworks such as Apache Spark have been utilized for running big data applications. Predicting the application's execution time has been an important goal since it can help the end user to determine the necessary processing resources to be reserved. While there have been some previous works that examine the problem of profiling Spark applications, they mainly focus on specific application types (e.g., Machine learning applications) and rely on the existence of a large number of previous execution runs. In this work we aim at overcoming these limitations by minimizing the number of past execution runs needed for the profiling phase. Furthermore, we identify patterns of continuous identical dataset transformations between different applications to cope with the limited historical data availability. We propose an on-line profiling framework, called Dione, that estimates the running times of new applications, even if no historical data is available. Finally, in our detailed experimental evaluation, using practical workloads on our local cluster, we illustrate that our approach accurately predicts the execution times of Spark applications and requires 30% less training time and monetary cost compared to the current state-of-the-art techniques.

Year	DOI	Venue
2017	10.1109/BigData.2017.8257950	2017 IEEE International Conference on Big Data (Big Data)
Keywords	DocType	ISSN
Profiling,Big Data,Spark,Graph similarity	Conference	2639-1589
ISBN	Citations	PageRank
978-1-5386-2716-7	1	0.35
References	Authors
0	3

Authors (3 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Nikos Zacheilas	1	79	9.40
Stathis Maroulis	2	5	2.07
Vana Kalogeraki	3	1686	124.40

1