Title
Low-Cost Online Convolution Checksum Checker
Abstract
Managing random hardware faults requires the faults to be detected online, thus simplifying recovery. Algorithm-based fault tolerance has been proposed as a low-cost mechanism to check online the result of computations against random hardware failures. In this case, the checksum of the actual result is checked against a predicted checksum computed in parallel by a hardware checker. In this work, we target the design of such checkers for convolution engines that are currently the most critical building block in image processing and computer vision applications. The proposed convolution checksum checker, named ConvGuard, utilizes a newly introduced invariance condition of convolution to predict <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">implicitly</i> the output checksum using only the pixels at the border of the input image. In this way, ConvGuard reduces the power required for accumulating the input pixels without requiring large buffers to hold intermediate checksum results. The design of ConvGuard is generic and can be configured for different output sizes and strides. The experimental results show that ConvGuard utilizes only a small percentage of the area/power of an efficient convolution engine while being significantly smaller and more power efficient than a state-of-the-art checksum checker for various practical cases.
Year
DOI
Venue
2022
10.1109/TVLSI.2021.3119511
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Keywords
DocType
Volume
Algorithm-based fault tolerance,convolution,error detection,reliability
Journal
30
Issue
ISSN
Citations 
2
1063-8210
1
PageRank 
References 
Authors
0.38
0
5