Title
Recovery Schemes for Mesh Arrays Utilizing Dedicated Spares.
Abstract
Error recovery capability is examined in processing arrays that employ spare nodes for fault tolerance. Spares can provide fault tolerance to high-performance single-package arrays, where it is not feasible to repair faulty subsystems. The cost of such a fault-tolerance solution, redundant hardware that idles until needed, may not be practical. Manufacturers must be offered hardware solutions to fault tolerance that provide useful work at all times. In this paper, new schemes are presented in which idling spares can be utilized to improve error recovery. Without expedient error recovery, computation in environments experiencing frequent errors can be burdened with extra cost in terms of job completion time. Further, in such environments, a job may never be able to reach completion. Spares will aid in the validation and in the selection of recovery points in systems experiencing randomly distributed errors. Successful job completion in environments of error bursts is performed with the aid of an existing scheme that identifies reliable data when periodic on-line testing is available. Spares will help identify the boundaries of reliable data. We consider these features in mesh arrays that are used in digital signal processing applications. Preliminary simulations highlight the overhead of our schemes in terms of job completion times in environments burdened with transient errors.
Year
DOI
Venue
1996
10.1109/DFTVS.1996.572039
IEEE Transactions on Reliability
Keywords
Field
DocType
frequent error,recovery point,fault tolerance,dedicated spares,successful job completion,error recovery capability,job completion time,error burst,recovery schemes,error recovery,reliable data,expedient error recovery,vlsi,digital signal processing,manufacturing,fault tolerant,hardware,testing
Digital signal processing,Spare part,System recovery,Computer science,Electronic engineering,Computer errors,Real-time computing,Fault tolerance,Very-large-scale integration,Distributed computing,Computation
Conference
Volume
Issue
ISBN
53
4
0-8186-7545-4
Citations 
PageRank 
References 
0
0.34
6
Authors
3
Name
Order
Citations
PageRank
S. R. Goldberg100.34
S. J. Upadhyaya2428.04
W. Kent Fuchs31469279.02