Title
Killi: Runtime Fault Classification to Deploy Low Voltage Caches without MBIST
Abstract
Supply voltage (V-DD) scaling is one of the most effective mechanisms to reduce energy consumption in high-performance microprocessors. However, V-DD scaling is challenging for SRAM-based on-chip memories such as caches due to persistent failures at low voltage (LV). Previously designed LV-enabling mechanisms require additional Memory Built-in Self-Test (MBIST) steps, employed either offline or online to identify persistent failures for every LV operating mode. However, these additional MBIST steps are time consuming, resulting in extended boot time or delayed power state transitions. Furthermore, most prior techniques combine MBIST-based solutions with customized Error Correction Codes (ECC), which suffer from non-trivial area or performance overheads.In this paper, we highlight the practical challenges for deploying LV techniques and propose a new low-cost error protection scheme, called Killi, which leverages conventional ECC and parity to enable LV operation. Foremost, the failing lines are discovered dynamically at runtime using both parity and ECC, negating the need for extra MBIST testing. Killi then provides on demand error protection by decoupling cheap error detection from expensive error correction. Killi provides error detection capability to all lines using parity but employs Single Error Correction, Double Error Detection (SECDED) ECC for a subset of the lines with a single LV fault. All lines with more than one fault are disabled. We evaluate this completely hardware enclosed solution on a GPU write-through L2 cache and show that the V-min (minimum reliable V-DD) can be reduced to 62.5% of nominal VDD when operating at 1GHz with only a maximum of 0.8% performance degradation. As a result, an 8-CU GPU with Killi can reduce the power consumption of the L2 cache by 59.3% compared to the baseline L2 cache running at nominal VDD. In addition, Killi reduces the error protection area overhead by 50% compared to SECDED ECC.
Year
DOI
Venue
2019
10.1109/HPCA.2019.00046
2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)
Keywords
Field
DocType
Error correction codes,Circuit faults,Graphics processing units,Random access memory,Computer architecture,Microprocessors,Error correction
Computer science,Parallel computing,Error detection and correction,Low voltage,Embedded system
Conference
ISSN
ISBN
Citations 
1530-0897
978-1-7281-1444-6
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Shrikanth Ganapathy1142.22
J. Kalamatianos2324.19
Bradford Beckmann32390101.06
Steven Raasch4212.80
Lukasz G. Szafaryn500.34