Title | ||
---|---|---|
Analysis and Evaluation of a New Algorithm Based Fault Tolerance for Computing Systems |
Abstract | ||
---|---|---|
In this paper, the authors present a new approach to algorithm based fault tolerance ABFT for High Performance computing system. The Algorithm Based Fault Tolerance approach transforms a system that does not tolerate a specific type of fault, called the fault-intolerant system, to a system that provides a specific level of fault tolerance, namely recovery. The ABFT techniques that detect errors rely on the comparison of parity values computed in two ways, the parallel processing of input parity values produce output parity values comparable with parity values regenerated from the original processed outputs, can apply convolution codes for the redundancy. This method is a new approach to concurrent error correction in fault-tolerant computing systems. This paper proposes a novel computing paradigm to provide fault tolerance for numerical algorithms. The authors also present, implement, and evaluate early detection in ABFT. |
Year | DOI | Venue |
---|---|---|
2012 | 10.4018/jghpc.2012010103 | IJGHPC |
Keywords | Field | DocType |
fault-intolerant system,computing systems,fault-tolerant computing system,abft technique,fault tolerance abft,fault tolerance approach,input parity value,output parity,new algorithm,new approach,high performance computing system,fault tolerance,redundancy,convolution code,error correction | Convolutional code,Computer science,Parallel computing,Error detection and correction,Redundancy (engineering),Fault tolerance,Computing systems | Journal |
Volume | Issue | ISSN |
4 | 1 | 1938-0259 |
Citations | PageRank | References |
8 | 0.50 | 15 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hodjat Hamidi | 1 | 58 | 6.65 |
Abbas Vafaei | 2 | 61 | 7.47 |
Seyed Amir Hassan Monadjemi | 3 | 23 | 1.76 |