Title
BlockHammer: Improving Flash Reliability by Exploiting Process Variation Aware Proactive Failure Prediction
Abstract
nand flash-based storage devices have gained a lot of popularity in recent years. Unfortunately, flash blocks suffer from limited endurance. For guaranteeing flash reliability, flash manufactures also prescribe a specified number of program and erase (P/E) cycles to define the endurance of flash blocks within the same chip. To extend the service lifetime of a flash-based device, existing works also assume that flash blocks have the same endurance and take P/E-based wear-leveling algorithms which evenly distribute P/E cycle across flash blocks in the controller. However, many studies indicate flash blocks exhibit a wide endurance difference due to the fabrication process. The endurance of flash blocks is limited by the weakest block. Thus, the traditional P/E-based block retirement mechanism makes flash blocks underutilized. To best excavate the endurance of all blocks and improve the reliability of flash devices, we present BlockHammer, a process variation aware proactive failure prediction scheme. BlockHammer takes process variation and blocks similarity into consideration, it consists of a block classifier and a block lifetime predictor. Using machine learning technology, we first establish a block classifier to classify flash blocks into different classes. Based on the classification results, we then establish the block lifetime prediction model for different classes. Flash blocks belonging to the same class are assigned the same model. To verify the effectiveness of BlockHammer, we collect block data from a real nand flash-based testing platform by emulating the true application scenario of nand flash. We compare the predicted value and the tested value, the experimental results show the proposed proactive failure scheme can achieve more than 92% accuracy for flash blocks. Therefore, the block failure point can be accurately predicted using BlockHammer in advance, which greatly enhance the reliability of nand flash.
Year
DOI
Venue
2020
10.1109/TCAD.2020.2981025
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Keywords
DocType
Volume
Endurance,failure,machine learning,NAND flash,prediction
Journal
39
Issue
ISSN
Citations 
12
0278-0070
1
PageRank 
References 
Authors
0.38
0
7
Name
Order
Citations
PageRank
Ruixiang Ma121.43
Fei Wu210435.76
Zhonghai Lu31063100.12
Wenmin Zhong410.38
Qiulin Wu510.38
Jiguang Wan6299.71
Changsheng Xie7329.93