Title
Fault tolerance by means of external monitoring of computer systems
Abstract
A frequently suggested solution to the problem of increasing the reliability of an already existing computer system (to be called the object machine [OM]) is to employ a functionally and physically separate monitor computer (to be called the monitor machine [MM]) that probes the operation of the OM in real time. The purpose of the monitoring is to assure that the functional performance of the OM does not deviate from the behavior specified by its design and by the programs being executed. This paper systematically assesses the architectural and fault-tolerance issues that have to be resolved to effectively implement the monitoring process. The goal of the implementation is to create an integrated and uniformly fault-tolerant OM/MM complex, beginning with a given OM design. Four principal problems are addressed in the subsequent sections: (1) implementation of the monitor machine; (2) implementation of the monitoring (OM/MM) interface; (3) specification of the monitoring function; and (4) the cost and effectiveness of monitoring. The paper concludes with examples of model technical specifications for the architectural properties needed by the OM and the MM to attain a fault-tolerant implementation of the monitoring process.
Year
Venue
Keywords
1981
AFIPS Conference Proceedings; vol. 55 1986 National Computer Conference
fault-tolerant om,separate monitor computer,object machine,monitoring function,om design,computer system,fault-tolerant implementation,monitoring process,monitor machine,external monitoring,fault tolerance,architectural property,mm complex,real time,fault tolerant
Field
DocType
ISBN
Technical specifications,Computer science,Real-time computing,Fault tolerance,Embedded system
Conference
0-88283-049-X
Citations 
PageRank 
References 
9
2.19
5
Authors
1
Name
Order
Citations
PageRank
Algirdas Avizienis13116351.14