Title
Assuring application-level correctness against soft errors
Abstract
Traditionally, research in fault tolerance has required architectural state to be numerically perfect for program execution to be correct. However, in many programs, even if execution is not 100% numerically correct, the program can still appear to execute correctly from the user's perspective. To quantify user satisfaction, application-level fidelity metrics (such as PSNR) can be used. The output for such applications is defined to be correct if the fidelity metrics satisfy a certain threshold. However, such applications still contain instructions whose outputs are critical -- i. e. their correctness decides if the overall quality of the program output is acceptable. In this paper, we present an analysis technique for identifying such critical program segments. More importantly, our technique is capable of guaranteeing application-level correctness through a combination of static analysis and runtime monitoring. Our static analysis consists of data flow analysis followed by control flow analysis to find static critical instructions which affect several instructions. Critical instructions are further refined into likely non-critical and likely critical sets in a profiling phase. At runtime, we use a monitoring scheme to monitor likely non-critical instructions and take remedial actions if some likely non-critical instructions become critical. Based on this analysis, we minimize the number of instructions that are duplicated and checked at runtime using a software-based fault detection and recovery technique [20]. Put together, our approach can lead to 22% average energy savings for multimedia applications while guaranteeing application-level correctness, when compared to a recent work [9], which cannot guarantee application-level correctness. Comparing to the approach proposed in [20] which guarantees both application-level and numerical correctness, our method achieves 79% energy reduction.
Year
DOI
Venue
2011
10.1109/ICCAD.2011.6105319
ICCAD
Keywords
Field
DocType
likely critical set,critical program segment,static critical instruction,critical instruction,control flow analysis,analysis technique,application-level correctness,soft error,data flow analysis,likely non-critical instruction,static analysis,thermal,timing analysis,fault tolerant,software fault tolerance,satisfiability
Computer science,Profiling (computer programming),Fault detection and isolation,Correctness,Static analysis,Software fault tolerance,Data-flow analysis,Real-time computing,Control flow analysis,Program analysis,Reliability engineering
Conference
ISSN
ISBN
Citations 
1933-7760
978-1-4577-1398-9
23
PageRank 
References 
Authors
1.04
23
2
Name
Order
Citations
PageRank
Jason Cong1102787.55
Karthik Gururaj217712.19