Title
Bit-Exact ECC Recovery (BEER): Determining DRAM On-Die ECC Functions by Exploiting DRAM Data Retention Characteristics
Abstract
Increasing single-cell DRAM error rates have pushed DRAM manufacturers to adopt on-die error-correction coding (ECC), which operates entirely within a DRAM chip to improve factory yield. The on-die ECC function and its effects on DRAM reliability are considered trade secrets, so only the manufacturer knows precisely how on-die ECC alters the externally-visible reliability characteristics. Consequently, on-die ECC obstructs third-party DRAM customers (e.g., test engineers, experimental researchers), who typically design, test, and validate systems based on these characteristicsTo give third parties insight into precisely how on-die ECC transforms DRAM error patterns during error correction, we introduce Bit-Exact ECC Recovery (BEER), a new methodology for determining the full DRAM on-die ECC function (i.e., its parity-check matrix) without hardware tools, prerequisite knowledge about the DRAM chip or on-die ECC mechanism, or access to ECC metadata (e.g., error syndromes, parity information). BEER exploits the key insight that non-intrusively inducing data-retention errors with carefully-crafted test pat-terns reveals behavior that is unique to a specific ECC functionWe use BEER to identify the ECC functions of 80 real LPDDR4 DRAM chips with on-die ECC from three major DRAM manufacturers. We evaluate BEER's correctness in simulation and performance on a real system to show that BEER is effective and practical across a wide range of on-die ECC functions. To demonstrate BEER's value, we propose and discuss several ways that third parties can use BEER to improve their design and testing practices. As a concrete example, we introduce and evaluate BEEP, the first error profiling method-ology that uses the known on-die ECC function to recover the number and bit-exact locations of unobservable raw bit errors responsible for observable post-correction errors.
Year
DOI
Venue
2020
10.1109/MICRO50266.2020.00034
2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
Keywords
DocType
ISBN
DRAM,Error Correction Codes,ECC,Data Retention Errors,On-Die ECC,Reliability,Analysis,Simulation,Error Characterization,Testing,Memory
Conference
978-1-7281-7384-9
Citations 
PageRank 
References 
4
0.37
50
Authors
5
Name
Order
Citations
PageRank
Minesh Patel12049.82
Jeremie Kim226313.68
Taha Shahroodi3241.33
Hasan Hassan435217.76
Onur Mutlu59446357.40