Title
A Robust Technique to Make a 2D Advection Solver Tolerant to Soft Faults.
Abstract
We present a general technique to solve Partial Differential Equations, called robust stencils, which make them tolerant to soft faults, i.e. bit flips arising in memory or CPU calculations. We show how it can be applied to a two-dimensional Lax-Wendroff solver. The resulting 2D robust stencils are derived using an orthogonal application of their 1D counterparts. Combinations of 3 to 5 base stencils can then be created. We describe how these are then implemented in a parallel advection solver. Various robust stencil combinations are explored, representing tradeoff between performance and robustness. The results indicate that the 3-stencil robust combinations are slightly faster on large parallel workloads than Triple Modular Redundancy (TMR). They also have one third of the memory footprint. We expect the improvement to be significant if suitable optimizations are performed. Because faults are avoided each time new points are computed, the proposed stencils are also comparably robust to faults as TMR for a large range of error rates. The technique can be generalized to 3D (or higher dimensions) with similar benefits.
Year
DOI
Venue
2016
10.1016/j.procs.2016.05.505
ICCS
Keywords
Field
DocType
exascale computing, fault-tolerance, partial differential equations, robust stencils, advection equation, parallel computing, resilient computing
Exascale computing,Computer science,Stencil,Parallel computing,Triple modular redundancy,Robustness (computer science),Fault tolerance,Solver,Memory footprint,Partial differential equation
Conference
Volume
Issue
ISSN
80
C
1877-0509
Citations 
PageRank 
References 
1
0.37
6
Authors
6
Name
Order
Citations
PageRank
Peter E. Strazdins111017.83
Brian Lee210.37
Brendan Harding3335.41
Jackson Mayo4437.97
Jaideep Ray519824.42
Robert C. Armstrong610021.51