Title
Toward fault-tolerant parallel-in-time integration with PFASST
Abstract
Different strategies for fault-tolerant parallel-in-time integration are presented.A theoretical model for the overhead is given.Examples of diffusive and advective type are shown to confirm the effectiveness of the strategies.Further research directions are indicated, open questions are discussed. We introduce and analyze different strategies for the parallel-in-time integration method PFASST to recover from hard faults and subsequent data loss. Since PFASST stores solutions at multiple time steps on different processors, information from adjacent steps can be used to recover after a processor has failed. PFASSTs multi-level hierarchy allows to use the coarse level for correcting the reconstructed solution, which can help to minimize overhead. A theoretical model is devised linking overhead to the number of additional PFASST iterations required for convergence after a fault. The potential efficiency of different strategies is assessed in terms of required additional iterations for examples of diffusive and advective type.
Year
DOI
Venue
2015
10.1016/j.parco.2016.12.001
Parallel Computing
Keywords
Field
DocType
Algorithm-based fault tolerance,Resilience,Parallel-in-time integration,Gray-Scott model,Boussinesq equations
Convergence (routing),Data loss,Computer science,Parallel computing,Real-time computing,Fault tolerance,Hierarchy,Distributed computing
Journal
Volume
Issue
ISSN
62
C
0167-8191
Citations 
PageRank 
References 
0
0.34
0
Authors
2
Name
Order
Citations
PageRank
Robert Speck1525.86
Daniel Ruprecht27110.02