Title
Modeling and analysis of fault-tolerant distributed memories for networks-on-chip
Abstract
Advances in technology scaling increasingly make Network-on-Chips (NoCs) more susceptible to failures that cause various reliability challenges. With increasing area occupied by different on-chip memories, strategies for maintaining fault-tolerance of distributed on-chip memories become a major design challenge. We propose a system-level design methodology for scalable fault-tolerance of distributed on-chip memories in NoCs. We introduce a novel reliability clustering model for fault-tolerance analysis and shared redundancy management of on-chip memory blocks. We perform extensive design space exploration applying the proposed reliability clustering on a block-redundancy fault-tolerant scheme to evaluate the tradeoffs between reliability, performance, and overheads. Evaluations on a 64-core chip multiprocessor (CMP) with an 8x8 mesh NoC show that distinct strategies of our case study may yield up to 20% improvements in performance gains and 25% improvement in energy savings across different benchmarks, and uncover interesting design configurations.
Year
DOI
Venue
2013
10.7873/DATE.2013.326
DATE
Keywords
Field
DocType
reliability engineering,system on chip,redundancy
Computer architecture,Computer science,Parallel computing,Real-time computing,Design methods,Multiprocessing,Chip,Fault tolerance,Redundancy (engineering),Cluster analysis,Design space exploration,Scalability
Conference
ISSN
Citations 
PageRank 
1530-1591
2
0.36
References 
Authors
18
3
Name
Order
Citations
PageRank
Abbas Banaiyanmofrad1534.31
Nikil Dutt24960421.49
Gustavo Girão3264.10