Title
IPAS: intelligent protection against silent output corruption in scientific applications.
Abstract
This paper presents IPAS, an instruction duplication technique that protects scientific applications from silent data corruption (SDC) in their output. The motivation for IPAS is that, due to natural error masking, only a subset of SDC errors actually affects the output of scientific codes—we call these errors silent output corruption (SOC) errors. Thus applications require duplication only on code that, when affected by a fault, yields SOC. We use machine learning to learn code instructions that must be protected to avoid SOC, and, using a compiler, we protect only those vulnerable instructions by duplication, thus significantly reducing the overhead that is introduced by instruction duplication. In our experiments with five workloads, IPAS reduces the percentage of SOC by up to 90% with a slowdown that ranges between 1.04x and 1.35x, which corresponds to as much as 47% less slowdown than state-of-the-art instruction duplication techniques.
Year
DOI
Venue
2016
10.1145/2854038.2854059
CGO
Keywords
Field
DocType
Resilience, high-performance computing, compiler analysis, machine learning
Silent data corruption,Supercomputer,Computer science,Parallel computing,Real-time computing,Compiler,Corruption
Conference
ISSN
ISBN
Citations 
2164-2397
978-1-5090-4245-6
11
PageRank 
References 
Authors
0.53
26
5
Name
Order
Citations
PageRank
Ignacio Laguna123924.56
Martin Schulz216719.77
D. F. Richards39212.29
Jon Calhoun4474.75
Luke Olson523521.93