Abstract | ||
---|---|---|
This paper proposes a holistic reliability management engine, R2D3, for post-Moore's technology based parallel 3D systems that have low yield and high failure rate. The proposed engine, comprising of a controller, reconfigurable crossbars and defection circuitry, provides concurrent single-replay detection and diagnosis, fault-mitigating repair and aging-aware lifetime management at runtime. We show that R2D3 achieves 96% coverage of defects, repairs faulty cores, and reduces V-th degradation by 53%. This leads to a 78% performance improvement over 8 years and a 2.16x longer mean-time-to-failure over a baseline 8-core 3D processor with no reliability management. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/DAC18072.2020.9218497 | PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC) |
DocType | ISSN | Citations |
Conference | 0738-100X | 0 |
PageRank | References | Authors |
0.34 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Javad Bagherzadeh | 1 | 1 | 1.41 |
Aporva Amarnath | 2 | 39 | 5.18 |
Jielun Tan | 3 | 3 | 2.41 |
subhankar pal | 4 | 32 | 5.27 |
Ronald G. Dreslinski | 5 | 1258 | 81.02 |