Abstract | ||
---|---|---|
This paper presents a forward recovery method for the fault-tolerant execution of parallel software systems on multicomputers such that faults are neither detected nor diagnosed until the fault prevents progress in the computation of the system. The method minimizes the communication and synchronization overhead required to verify the reliability of the system and consequently minimizes the impact of fault-tolerance on the throughput of the computation. We say the system is credible provided that the system is diagnosable and complete, where complete means that at least one copy of each process exists on a fault-free processor. We apply the method to the process structure deriving from parallel, bounded-time decision systems and show through an exact Markov analysis that the method will yield a very credible system. We then introduce a much simpler but approximate Markov model that facilitates credibility analysis over a larger range of parameters and applications. |
Year | DOI | Venue |
---|---|---|
1992 | 10.1007/BF02241704 | Computing |
Keywords | Field | DocType |
multicomputer,decision-system.,parallel processing,bounded-time parallel system,bounded-time decision system,real-time expert-system,facilitates credibility analysis,process structure,parallel software system,parallel systems,credible execution,forward recovery method,credible system,approximate markov model,complete mean,exact markov analysis,fault-free processor,fault-tolerance,delayed diagnosis | Synchronization,Markov model,Computer science,Expert system,Markov chain,Algorithm,Software,Fault tolerance,Throughput,Computation | Journal |
Volume | Issue | ISSN |
48 | 1 | 1436-5057 |
Citations | PageRank | References |
0 | 0.34 | 12 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
R. Shankar | 1 | 0 | 0.34 |
Daniel P. Miranker | 2 | 947 | 188.72 |