Title
Toward Systematic Design of Fault-Tolerant Systems
Abstract
The mid-century "space race" was a major impetus for the development of fault-tolerant computing. Over the succeeding 25 years researchers expanded the concept of fault tolerance and refined the techniques for achieving it. Nevertheless, the bottom-up approach, entailing an infrastructure of autonomously fault-tolerant subsystems integrated with global fault tolerance functions, is less common today than the top-down approach, which relies on off-the-shelf (OTS) subsystems and a global monitoring function. A design paradigm for the systematic treatment of fault tolerance involves four steps: specification, implementation, evaluation, and modification. The paradigm offers a way to minimize the probability of oversights, mistakes, and inconsistencies that may occur during the implementation of fault tolerance. In spite of the long-range merits of this bottom-up approach, time and cost constraints often lead developers to use OTS subsystems when designing systems that are expected to be highly dependable. Even the Pentium Pro, which appears to have the most complete set of fault tolerance functions among contemporary microprocessors, has major drawbacks. Moreover, systems built from OTS subsystems are difficult to retrofit for fault tolerance. Without hardware support for fault tolerance, the only solution is to build a software monitor subsystem that tries to check all subsystems for indications of failure. But the monitor itself is unprotected because it resides and executes on an OTS processor. Researchers would do well to consider the human immune system as a model for systems in which fault tolerance is an integral attribute of every hardware element.
Year
DOI
Venue
1997
10.1109/2.585154
IEEE Computer
Keywords
DocType
Volume
fault-tolerant computing,design paradigm,OTS subsystems,top-down approach,fault tolerance function,OTS processor,global fault tolerance function,Fault-Tolerant Systems,autonomously fault-tolerant,bottom-up approach,fault tolerance,Systematic Design
Journal
30
Issue
ISSN
Citations 
4
0018-9162
78
PageRank 
References 
Authors
7.77
6
1
Name
Order
Citations
PageRank
Algirdas Avizienis13116351.14