Title
Quantitative Analysis of Long-Latency Failures in System Software
Abstract
This paper presents a study on long latency failures using accelerated fault injection. The data collected from the experiments are used to analyze the significance, causes, and characteristics of long latency failures caused by soft errors in the processor and the memory. The results indicate that a non-negligible portion of soft errors in the code and data memory lead to long latency failures. The long latency failures are caused by errors with long fault activation times and errors causing failures only under certain runtime conditions. On the other hand, less than 0.5% of soft errors in the processor registers used in kernel mode lead to a failure with latency longer than a thousand seconds. This is due to a strong temporal locality of the register values. The study shows also that the obtained insight can be used to guide design and placement (in the application code and/or system) of application-specific error detectors.
Year
DOI
Venue
2009
10.1109/PRDC.2009.13
PRDC
Keywords
Field
DocType
application-specific error detector,certain runtime condition,kernel mode lead,application code,long-latency failures,non-negligible portion,data memory lead,quantitative analysis,soft error,long latency failure,system software,long fault activation time,accelerated fault injection,data collection,servers,operating system,hardware,kernel,registers
Kernel (linear algebra),System software,Locality of reference,Latency (engineering),Computer science,Server,Real-time computing,Processor register,Detector,Fault injection
Conference
Citations 
PageRank 
References 
8
0.64
11
Authors
3
Name
Order
Citations
PageRank
Keun Soo Yim116914.43
Zbigniew Kalbarczyk21896159.48
Ravishankar K. Iyer33489504.32