Title
Monitoring The Health Of Emerging Neural Network Accelerators With Cost-Effective Concurrent Test
Abstract
ReRAM-based neural network accelerator is a promising solution to handle the memory- and computation-intensive deep learning workloads. However, it suffers from unique device errors. These errors can accumulate to massive levels during the run time and cause significant accuracy drop. It is crucial to obtain its fault status in real-time before any proper repair mechanism can be applied. However, calibrating such statistical information is non-trivial because of the need of a large number of test patterns, long test time, and high test coverage considering that complex errors may appear in million-to-billion weight parameters. In this paper, we leverage the concept of corner data that can significantly confuse the decision making of neural network model, as well as the training algorithm, to generate only a small set of test patterns that is tuned to be sensitive to different levels of error accumulation and accuracy loss. Experimental results show that our method can quickly and correctly report the fault status of a running accelerator, outperforming existing solutions in both detection efficiency and cost.
Year
DOI
Venue
2020
10.1109/DAC18072.2020.9218675
PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC)
DocType
ISSN
Citations 
Conference
0738-100X
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Qi Liu1173.67
Tao Liu2457.40
Zihao Liu3345.45
Wujie Wen430030.61
Chengmo Yang530232.31