Title
Susceptibility of commodity systems and software to memory soft errors
Abstract
It is widely understood that most system downtime is accounted for by programming errors and administration time. However, a growing body of work has indicated an increasing cause of downtime may stem from transient errors in computer system hardware due to external factors, such as cosmic rays. This work indicates that moving to denser semiconductor technologies at lower voltages has the potential to increase these transient errors. In this paper, we investigate the susceptibility of commodity operating systems and applications on commodity PC processors to these soft-errors and we introduce ideas regarding the improved recovery from these transient errors in software. Our results indicate that, for the Linux kernel and a Java virtual machine running sample workloads, many errors are not activated, mostly due to overwriting. In addition, given current and upcoming microprocessor support, our results indicate that those errors activated, which would normally lead to system reboot, need not be fatal to the system if software knowledge is used for simple software recovery. Together, they indicate the benefits of simple memory soft error recovery handling in commodity processors and software.
Year
DOI
Venue
2004
10.1109/TC.2004.119
Computers, IEEE Transactions
Keywords
Field
DocType
Java,Linux,error handling,operating system kernels,system recovery,Java virtual machine,Linux kernel,commodity system susceptibility,memory soft errors,operating systems,software recovery,transient errors,65,Index Terms- Soft errors,Java,commodity,memory errors,operating systems,recovery.
Reboot,Computer science,Real-time computing,Software,Memory errors,Linux kernel,Soft error,Parallel computing,Microprocessor,Downtime,Java,Operating system,Embedded system
Journal
Volume
Issue
ISSN
53
12
0018-9340
Citations 
PageRank 
References 
30
2.09
17
Authors
4
Name
Order
Citations
PageRank
Messer, A.1302.09
Bernadat, P.2907.24
Fu, G.3302.09
DeQing Chen4896.95