Title
Continuously Improving the Resource Utilization of Iterative Parallel Dataflows
Abstract
Parallel dataflow systems like Apache Flink allow analysis of large datasets with iterative programs. However, allocating a cost-effective set of resources for such jobs is a difficult task as the resource utilization depends on many factors such as dataset size, key value distributions, computational complexity of programs, and the underlying hardware. What's more, some of these factors are not well known before the execution. There are, for example, often no data statistics such as key value distributions available beforehand. For this reason, we propose to improve the resource utilization at runtime using the repetitive nature of iterative dataflow programs. Based on runtime statistics gathered in previous iterations, the resource allocation is adapted dynamically at the synchronization barriers between iterations. This approach has two advantages: First, at barriers detailed statistics can be available, even for parallelly executed task pipelines. Second, at barriers dataflows can be adapted without complex handling of intermediate task state. This paper presents a prototype integrated with Apache Flink and an evaluation on a cluster with 480 cores. One experiment shows a 57% reduction of the job runtime by allocating more resources for a shorter time, another experiment a release of up to 40% surplus resources without significantly extending the job runtime.
Year
DOI
Venue
2016
10.1109/ICDCSW.2016.20
2016 IEEE 36th International Conference on Distributed Computing Systems Workshops (ICDCSW)
Keywords
Field
DocType
Parallel Dataflows,Scalable Data Processing,Re-source Utilization,Dynamic Scaling
Synchronization,Pipeline transport,Computer science,Parallel computing,Dynamic scaling,Resource allocation,Dataflow,Computational complexity theory,Distributed computing
Conference
ISSN
ISBN
Citations 
1545-0678
978-1-5090-3687-5
1
PageRank 
References 
Authors
0.35
19
3
Name
Order
Citations
PageRank
Lauritz Thamsen1439.26
Thomas Renner2185.47
Odej Kao3106696.19