Title
Exploiting replicated checkpoints for soft error detection and correction
Abstract
Register renaming is a widely used technique to remove false dependencies in contemporary superscalar microprocessors. A register alias table (RAT) is formed to hold current locations of the values that correspond to the architectural registers. Some recently designed processors take a copy of the rename table at each branch instruction, in order to recover its contents when a misspeculation occurs. In this paper first we investigate the RAT vulnerability against transient errors. Then we analyze the vulnerability of RAT checkpoints and propose two techniques for soft error detection and correction utilizing redundantly taken copies of the entries whose content is the same with the previous and/or next checkpoints. Simulation results of the spec 2006 benchmarks reveal that on the average RAT vulnerability is 25% and checkpoint vulnerability is 6%. Results also reveal that redundancy exists at sequential checkpoint copies and can be used for error detection and correction purposes. We propose techniques that exploit this redundancy and show that faults in 41% of all checkpoints and 44% of rolled-back checkpoints can be detected and errors in 33% of the rolled-back checkpoints can be corrected. Since we exploit the already available storage, proposed error detection and correction techniques can be implemented with minimal hardware overhead.
Year
DOI
Venue
2013
10.7873/DATE.2013.304
DATE
Keywords
Field
DocType
error detection and correction,benchmark testing,hardware,soft error,registers,redundancy
Alias,Soft error,Computer science,Parallel computing,Error detection and correction,Exploit,Real-time computing,Redundancy (engineering),Register renaming,Spec#,Rename
Conference
ISSN
Citations 
PageRank 
1530-1591
0
0.34
References 
Authors
13
4
Name
Order
Citations
PageRank
Fahrettin Koc122.14
Kenan Bozdas200.68
Burak Karsli300.34
Oguz Ergin442425.84