Title
FT-DeepNets: Fault-Tolerant Convolutional Neural Networks with Kernel-based Duplication
Abstract
Deep neural network (deepnet) applications play a crucial role in safety-critical systems such as autonomous vehicles (AVs). An AV must drive safely towards its destination, avoiding obstacles, and respond quickly when the vehicle must stop. Any transient errors in software calculations or hardware memory in these deepnet applications can potentially lead to dramatically incorrect results. Therefore, assessing and mitigating any transient errors and providing robust results are important for safety-critical systems. Previous research on this subject focused on detecting errors and then recovering from the errors by re-running the network. Other approaches were based on the extent of full network duplication such as the ensemble learning-based approach to boost system fault-tolerance by leveraging each model's advantages. However, it is hard to detect errors in a deep neural network, and the computational overhead of full redundancy can be substantial. We first study the impact of the error types and locations in deepnets. We next focus on selecting which part should be duplicated using multiple ranking methods to measure the order of importance among neurons. We find that the duplication overhead for computation and memory is a tradeoff between algorithmic performance and robustness. To achieve higher robustness with less system overhead, we present two error protection mechanisms that only duplicate parts of the network from critical neurons. Finally, we substantiate the practical feasibility of our approach and evaluate the improvement in the accuracy of a deepnet in the presence of errors. We demonstrate these results using a case study with real-world applications on an Nvidia GeForce RTX 2070Ti GPU and an Nvidia Xavier embedded platform used by automotive OEMs.
Year
DOI
Venue
2022
10.1109/WACV51458.2022.00194
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022)
DocType
ISSN
Citations 
Conference
2472-6737
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Iljoo Baek111.38
Wei Chen200.34
Zhihao Zhu300.34
Soheil Samii400.34
Ragunathan Raj Rajkumar522.06