Title
Towards Formal Approaches to System Resilience
Abstract
Technology scaling and techniques such as dynamic voltage/frequency scaling are predicted to increase the number of transient faults in future processors. Error detectors implemented in hardware are often energy inefficient, as they are "always on." While software-level error detection can augment hardware-level detectors, creating detectors in software that are highly effective remains a challenge. In this paper, we first present anew LLVM-level fault injector called KULFI that helps simulate faults occurring within CPU state elements in a versatile manner. Second, using KULFI, we study the behavior of a family of well-known and simple algorithms under error injection. (We choose a family of sorting algorithms for this study.) We then propose a promising way to interpret our empirical results using a formal model that builds on the idea of predicate state transition diagrams. After introducing the basic abstraction underlying our predicate transition diagrams, we draw connections to the level of resilience empirically observed during fault injection studies. Building on the observed connections, we develop a simple, and yet effective, predicate-abstraction-based fault detector. While in its initial stages, ours is believed to be the first study that offers a formal way to interpret and compare fault injection results obtained from algorithms from within one family. Given the absolutely unpredictable nature of what a fault can do to a computation in general, our approach may help designers choose amongst a class of algorithms one that behaves most resilient of all.
Year
DOI
Venue
2013
10.1109/PRDC.2013.14
Dependable Computing
Keywords
Field
DocType
error injection,fault injection result,system resilience,error detector,software-level error detection,transient fault,llvm-level fault injector,predicate-abstraction-based fault detector,towards formal approaches,formal model,fault injection study,cpu state element,fault tolerance
Stuck-at fault,Computer science,Software fault tolerance,Real-time computing,Theoretical computer science,Error detection and correction,Fault tolerance,Frequency scaling,Fault model,Fault injection,Sorting algorithm,Distributed computing
Conference
Citations 
PageRank 
References 
31
0.98
20
Authors
4
Name
Order
Citations
PageRank
Vishal Chandra Sharma1311.32
Arvind Haran2310.98
Zvonimir Rakamaric332721.22
Ganesh Gopalakrishnan41619130.11