Title
Intermittent Hardware Errors Recovery: Modeling and Evaluation
Abstract
The frequency of hardware errors is increasing due to shrinking feature sizes, higher levels of integration, and increasing design complexity. Intermittent errors are those that occur non-deterministically at the same location. It has been shown that intermittent hardware errors contribute to about 39% of the total hardware failures. Intermittent faults have characteristics that are different than transient and permanent errors, which makes it challenging to devise efficient recovery techniques for them. In this paper, we evaluate the impact of different intermittent error recovery scenarios on the processor performance. To achieve this, we model a system that consists of a fault-tolerant multicore processor subject to intermittent faults. Our fault models are based on insights from related work at the physical level. We find that the frequency of the intermittent error and the relative importance of the error location play an important role in choosing the recovery action that maximizes the processor's performance.
Year
DOI
Venue
2012
10.1109/QEST.2012.37
Quantitative Evaluation of Systems
Keywords
Field
DocType
intermittent hardware error,error location,processor performance,hardware error,intermittent hardware errors recovery,efficient recovery technique,permanent error,fault-tolerant multicore processor subject,intermittent fault,intermittent error,different intermittent error recovery,computational complexity
Error location,Computer science,System recovery,Real-time computing,Recovery - action,Computer hardware,Multi-core processor,Fault model,Computational complexity theory
Conference
ISBN
Citations 
PageRank 
978-0-7695-4781-7
7
0.56
References 
Authors
18
3
Name
Order
Citations
PageRank
Layali Rashid1393.41
Karthik Pattabiraman2103055.17
Sathish Gopalakrishnan342633.10