Title
FaultSim: A Fast, Configurable Memory-Reliability Simulator for Conventional and 3D-Stacked Systems.
Abstract
As memory systems scale, maintaining their Reliability Availability and Serviceability (RAS) is becoming more complex. To make matters worse, recent studies of DRAM failures in data centers and supercomputer environments have highlighted that large-granularity failures are common in DRAM chips. Furthermore, the move toward 3D-stacked memories can make the system vulnerable to newer failure modes, such as those occurring from faults in Through-Silicon Vias (TSVs). To architect future systems and to use emerging technology, system designers will need to employ strong error correction and repair techniques. Unfortunately, evaluating the relative effectiveness of these reliability mechanisms is often difficult and is traditionally done with analytical models, which are both error prone and time-consuming to develop. To this end, this article proposes FaultSim, a fast configurable memory-reliability simulation tool for 2D and 3D-stacked memory systems. FaultSim employs Monte Carlo simulations, which are driven by real-world failure statistics. We discuss the novel algorithms and data structures used in FaultSim to accelerate the evaluation of different resilience schemes. We implement BCH-1 (SECDED) and ChipKill codes using FaultSim and validate against an analytical model. FaultSim implements BCH-1 and ChipKill codes with a deviation of only 0.032% and 8.41% from the analytical model. FaultSim can simulate 1 million Monte Carlo trials (each for a period of 7 years) of BCH-1 and ChipKill codes in only 34 seconds and 33 seconds, respectively.
Year
DOI
Venue
2016
10.1145/2831234
TACO
Keywords
DocType
Volume
Error correcting codes,monte carlo simulation,stacked memory,reliability,through silicon vias
Journal
12
Issue
ISSN
Citations 
4
1544-3566
6
PageRank 
References 
Authors
0.43
8
3
Name
Order
Citations
PageRank
Prashant J. Nair134615.74
David A. Roberts2893.52
Moinuddin K. Qureshi32639110.61