Abstract | ||
---|---|---|
High performance computing (HPC) is increasingly subjected to faulty computations. The frequency of silent data corruptions (SDCs) in particular is expected to increase in emerging machines requiring HPC applications to handle SDCs. In this paper we, propose a robust fault injector structured through an LLVM compiler pass that allows simulation of SDCs in various applications. Although fault injection locations are enumerated at compile time, their activation is purely at runtime and based on a user-provided fault distribution. The robustness of our fault injector is in the ability to augment the runtime injection logic on a per application basis. This allows tighter control on the spacial, temporal, and probability of injected faults. The usability, scalability, and robustness of our fault injection is demonstrated with injecting faults into an algebraic multigird solver. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1007/978-3-319-14325-5_47 | Lecture Notes in Computer Science |
Field | DocType | Volume |
Supercomputer,Soft error,Computer science,Compile time,Parallel computing,Robustness (computer science),Compiler,Solver,Fault injection,Distributed computing,Scalability,Embedded system | Conference | 8805 |
ISSN | Citations | PageRank |
0302-9743 | 15 | 0.65 |
References | Authors | |
13 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jon Calhoun | 1 | 47 | 4.75 |
Luke Olson | 2 | 235 | 21.93 |
M. Snir | 3 | 3984 | 520.82 |