Abstract | ||
---|---|---|
Advances in technology scaling increasingly make Network-on-Chips (NoCs) more susceptible to failures that cause various reliability challenges. With increasing area occupied by different on-chip memories, strategies for maintaining fault-tolerance of distributed on-chip memories become a major design challenge. We propose a system-level design methodology for scalable fault-tolerance of distributed on-chip memories in NoCs. We introduce a novel reliability clustering model for fault-tolerance analysis and shared redundancy management of on-chip memory blocks. We perform extensive design space exploration applying the proposed reliability clustering on a block-redundancy fault-tolerant scheme to evaluate the tradeoffs between reliability, performance, and overheads. Evaluations on a 64-core chip multiprocessor (CMP) with an 8x8 mesh NoC show that distinct strategies of our case study may yield up to 20% improvements in performance gains and 25% improvement in energy savings across different benchmarks, and uncover interesting design configurations. |
Year | DOI | Venue |
---|---|---|
2013 | 10.7873/DATE.2013.326 | DATE |
Keywords | Field | DocType |
reliability engineering,system on chip,redundancy | Computer architecture,Computer science,Parallel computing,Real-time computing,Design methods,Multiprocessing,Chip,Fault tolerance,Redundancy (engineering),Cluster analysis,Design space exploration,Scalability | Conference |
ISSN | Citations | PageRank |
1530-1591 | 2 | 0.36 |
References | Authors | |
18 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Abbas Banaiyanmofrad | 1 | 53 | 4.31 |
Nikil Dutt | 2 | 4960 | 421.49 |
Gustavo Girão | 3 | 26 | 4.10 |