Title
Credible execution of bounded-time parallel systems with delayed diagnosis.
Abstract
This paper presents a forward recovery method for the fault-tolerant execution of parallel software systems on multicomputers such that faults are neither detected nor diagnosed until the fault prevents progress in the computation of the system. The method minimizes the communication and synchronization overhead required to verify the reliability of the system and consequently minimizes the impact of fault-tolerance on the throughput of the computation. We say the system is credible provided that the system is diagnosable and complete, where complete means that at least one copy of each process exists on a fault-free processor. We apply the method to the process structure deriving from parallel, bounded-time decision systems and show through an exact Markov analysis that the method will yield a very credible system. We then introduce a much simpler but approximate Markov model that facilitates credibility analysis over a larger range of parameters and applications.
Year
DOI
Venue
1992
10.1007/BF02241704
Computing
Keywords
Field
DocType
multicomputer,decision-system.,parallel processing,bounded-time parallel system,bounded-time decision system,real-time expert-system,facilitates credibility analysis,process structure,parallel software system,parallel systems,credible execution,forward recovery method,credible system,approximate markov model,complete mean,exact markov analysis,fault-free processor,fault-tolerance,delayed diagnosis
Synchronization,Markov model,Computer science,Expert system,Markov chain,Algorithm,Software,Fault tolerance,Throughput,Computation
Journal
Volume
Issue
ISSN
48
1
1436-5057
Citations 
PageRank 
References 
0
0.34
12
Authors
2
Name
Order
Citations
PageRank
R. Shankar100.34
Daniel P. Miranker2947188.72