Title
R2d3: A Reliability Engine For 3d Parallel Systems
Abstract
This paper proposes a holistic reliability management engine, R2D3, for post-Moore's technology based parallel 3D systems that have low yield and high failure rate. The proposed engine, comprising of a controller, reconfigurable crossbars and defection circuitry, provides concurrent single-replay detection and diagnosis, fault-mitigating repair and aging-aware lifetime management at runtime. We show that R2D3 achieves 96% coverage of defects, repairs faulty cores, and reduces V-th degradation by 53%. This leads to a 78% performance improvement over 8 years and a 2.16x longer mean-time-to-failure over a baseline 8-core 3D processor with no reliability management.
Year
DOI
Venue
2020
10.1109/DAC18072.2020.9218497
PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC)
DocType
ISSN
Citations 
Conference
0738-100X
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Javad Bagherzadeh111.41
Aporva Amarnath2395.18
Jielun Tan332.41
subhankar pal4325.27
Ronald G. Dreslinski5125881.02