Title
Reducing Fault-tolerant Overhead for Distributed Stream Processing with Approximate Backup
Abstract
The stream processing model continuously processes online data in an on-pass fashion that can be more vulnerable to failures than other offline-data processing schemes. Checkpoint-based fault-tolerant methods have been widely used to enhance the reliability of stream processing systems. To ensure exact data recoveries upon failures, full-backup mechanisms are used to store a complete copy of data, which introduces substantial runtime overhead and increases output latency. In the meantime, a wide range of online processing applications prefer quick-and-dirty results with a slight degradation inaccuracy to delayed exact results. This paper introduces a novel approximate fault-tolerant problem (OAFP) with the objective of reducing the failure-free fault-tolerant overhead and ensuring user-defiled output accuracy requirement upon failure at the same time. We present an approximate fault-tolerant scheme based on sampling backup mechanism and study the trade-off between fault-tolerant overhead and output accuracy in stream processing systems. We proposed two algorithms to compute backup plans for both single-node failure and correlated failure scenarios. Extensive experiments with different types of stream topologies are conducted on our simulator to verify the correctness and effectiveness of our approach. We prove our solution guarantees the output accuracy requirement with minimum FT latency for directed acyclic graph (DAG) stream topologies with single-node failures.
Year
DOI
Venue
2020
10.1109/ICCCN49398.2020.9209717
2020 29th International Conference on Computer Communications and Networks (ICCCN)
DocType
ISSN
ISBN
Conference
1095-2055
978-1-7281-6607-0
Citations 
PageRank 
References 
0
0.34
0
Authors
5
Name
Order
Citations
PageRank
Yuan Zhuang165.84
Xiaohui Wei239154.44
Hongliang Li300.68
Mingkai Hou400.34
Yundi Wang500.34