Title
Dynamically Improving Resiliency to Timing Errors for Stream Processing Workloads
Abstract
Large-scale data processing paradigms, such as stream processing, are widespread in academic and corporate workloads. These environments are commonly subject to real-time requirements, such as latency and throughput, and resiliency requirements to node or network failures. These requirements have generally been approached as separate problems. Intermittent timing delays due to factors such as garbage collection can further complicate the management of the stream processing workload. Insufficient resource allocations can also lead to poor performance. Currently, tuning these applications is done manually. We show that improper configuration can greatly affect performance. It is reported that even 100ms of increased latency in online sales platforms can potentially result in lower sales. In this paper we propose Dynamo, a framework and monitor that implements a methodology for addressing both the performance and timing error problems by increasing the resiliency of stream processing frameworks to timing delays. Dynamo autonomously adjusts the resource allocation by using a performance profile that is generated through application profiling. Dynamo partitions an application's allocated resources into active and passive partitions that are dynamically adjusted to match an application's multi-modal behavior. The distribution of resources determines the amount of computation that Dynamo can duplicate and process redundantly, thereby reducing the probability of timing errors that affect a tuple's total execution time. In our experiments, we observed improvements in the number of tuples with missed deadlines. Our results show that Dynamo is able to consistently improve the resiliency to timing errors over a number of differing occurrence rates. Furthermore, we show that the improvement in the number of missed deadlines increases with the amount of spare resources, with a 71.40% reduction in the best case.
Year
DOI
Venue
2017
10.1109/PDCAT.2017.00080
2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)
Keywords
Field
DocType
stream processing,fault tolerance,garbage collection,resiliency,real-time,resource scheduling
Resource management,Data processing,Tuple,Latency (engineering),Computer science,Real-time computing,Resource allocation,Garbage collection,Throughput,Stream processing,Distributed computing
Conference
ISBN
Citations 
PageRank 
978-1-5386-3152-2
0
0.34
References 
Authors
5
3
Name
Order
Citations
PageRank
Geoffrey Phi C. Tran182.26
John Paul Walters226720.45
Stephen P. Crago316820.38