Title
Encore: low-cost, fine-grained transient fault recovery
Abstract
To meet an insatiable consumer demand for greater performance at less power, silicon technology has scaled to unprecedented dimensions. However, the pursuit of faster processors and longer battery life has come at the cost of reliability. Given the rise of processor reliability as a first-order design constraint, there has been a growing interest in low-cost, non-intrusive techniques for transient fault detection. Many of these recent proposals have counted on the availability of hardware recovery mechanisms. Although common in aggressive out-of-order cores, hardware support for speculative rollback and recovery is less common in lower-end commodity processors. This paper presents Encore, a software-based fault recovery mechanism tailored for these lower-cost systems that lack native hardware support for speculative rollback recovery. Encore combines program analysis, profile data, and simple code transformations to create statistically idempotent code regions that can recover from faults at very little cost. Using this software-only, compiler-based approach, Encore provides the ability to recover from transient faults without specialized hardware or the costs of traditional, full-system checkpointing solutions. Experimental results show that Encore, with just 14% of runtime overhead, can safely recover, on average from 97% of transient faults when coupled with existing detection schemes.
Year
DOI
Venue
2011
10.1145/2155620.2155667
MICRO
Keywords
Field
DocType
fine-grained transient fault recovery,hardware recovery mechanism,speculative rollback recovery,transient fault detection,specialized hardware,idempotent code region,hardware support,transient fault,native hardware support,detection scheme,software-based fault recovery mechanism,program analysis,first order,out of order
Computer science,Fault detection and isolation,Parallel computing,Recovery mechanism,Real-time computing,Compiler,Software,Rollback recovery,Program analysis,Rollback,Embedded system
Conference
ISSN
ISBN
Citations 
1072-4451
978-1-5090-6605-6
33
PageRank 
References 
Authors
1.02
28
5
Name
Order
Citations
PageRank
Shuguang Feng130612.96
Shantanu Gupta239016.39
Amin Ansari336115.88
Scott Mahlke44811312.08
David I. August52245123.66