Title
Reliability Evaluation of Compressed Deep Learning Models
Abstract
Neural networks are becoming deeper and more complex, making it harder to store and process such applications on systems with limited resources. Model pruning and data quantization are two effective ways to simplify the necessary hardware by compressing the network with relevant-only nodes and reducing the required data precision. Such optimizations, however, might come at a cost of reliability since critical nodes are now more exposed to faults and the network is more sensitive to small changes. In this work, we present an extensive empirical investigation of transient faults on compressed deep convolutional neural networks (CNNs). We evaluate the impact of a single bit flip over three CNN models with different sparsity configurations and integer-only quantizations. We show that pruning can increase the resilience of the system by 9× when compared to the dense model. Quantization can outperform the 32-bit floating-point baseline by adding 27.4× more resilience to the overall network and up to 108.7× when combined with pruning. This makes model compression an effective way to provide resilience to deep learning workloads during inference, mitigating the need for explicit error correction hardware.
Year
DOI
Venue
2020
10.1109/LASCAS45839.2020.9069026
2020 IEEE 11th Latin American Symposium on Circuits & Systems (LASCAS)
Keywords
DocType
ISSN
Resilience,Soft Error,Transient Fault,Neural Network,Deep Learning
Conference
2330-9954
ISBN
Citations 
PageRank 
978-1-7281-3428-4
2
0.39
References 
Authors
0
9