Title
Lightweight Fault Tolerance in Pregel-Like Systems
Abstract
Pregel-like systems are popular for iterative graph processing thanks to their user-friendly vertex-centric programming model. However, existing Pregel-like systems only adopt a naïve checkpointing approach for fault tolerance, which saves a large amount of data about the state of computation and significantly degrades the failure-free execution performance. Advanced fault tolerance/recovery techniques are left unexplored in the context of Pregel-like systems. This paper proposes a non-invasive lightweight checkpointing (LWCP) scheme which minimizes the data saved to each checkpoint, and additional data required for recovery are generated online from the saved data. This improvement results in 10x speedup in checkpointing, and an integration of it with a recently proposed log-based recovery approach can further speed up recovery when failure occurs. Extensive experiments verified that our proposed LWCP techniques are able to significantly improve the performance of both checkpointing and recovery in a Pregel-like system.
Year
DOI
Field
2019
10.1145/3337821.3337823
Computer science,Parallel computing,Fault tolerance,Distributed computing
DocType
ISSN
ISBN
Conference
978-1-4503-6295-5
978-1-4503-6295-5
Citations 
PageRank 
References 
0
0.34
0
Authors
5
Name
Order
Citations
PageRank
Da Yan138734.45
James Cheng22044101.89
Hongzhi Chen34713.00
Cheng Long419423.70
Purushotham Bangalore514323.66