Title
Fault Recovery Methods For Asynchronous Linear Solvers
Abstract
This study seeks to understand the soft error vulnerability of asynchronous iterative methods, with a focus on stationary iterative solvers such as Jacobi. A theoretical investigation into the performance of the asynchronous iterative methods is presented and used to motivate several fault recovery methods for asynchronous linear solvers. The numerical experiments utilize a hybrid-parallel implementation where the computational work is distributed over multiple nodes using MPI and parallelized on each node using OpenMP, and a series of runs are conducted to measure both the impact of soft faults and the effectiveness of the recovery methods. Trials are run to compare two models for simulating the occurrence of a fault as well as techniques for recovering from the effects of a fault. The results show that the proposed strategies can effectively recover from the impact of a fault and that the numerical model for simulating soft faults consistently produces fault effects that enable the investigation and tuning of recovery techniques in action.
Year
DOI
Venue
2021
10.1007/s10766-020-00676-w
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING
Keywords
DocType
Volume
Soft fault, Fault tolerance, Fault models, Asynchronous iteration, Linear system solver
Journal
49
Issue
ISSN
Citations 
1
0885-7458
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Evan Coleman112.72
Erik J. Jensen200.34
Masha Sosonkina327245.62