Title
Analysis and Evaluation of a New Algorithm Based Fault Tolerance for Computing Systems
Abstract
In this paper, the authors present a new approach to algorithm based fault tolerance ABFT for High Performance computing system. The Algorithm Based Fault Tolerance approach transforms a system that does not tolerate a specific type of fault, called the fault-intolerant system, to a system that provides a specific level of fault tolerance, namely recovery. The ABFT techniques that detect errors rely on the comparison of parity values computed in two ways, the parallel processing of input parity values produce output parity values comparable with parity values regenerated from the original processed outputs, can apply convolution codes for the redundancy. This method is a new approach to concurrent error correction in fault-tolerant computing systems. This paper proposes a novel computing paradigm to provide fault tolerance for numerical algorithms. The authors also present, implement, and evaluate early detection in ABFT.
Year
DOI
Venue
2012
10.4018/jghpc.2012010103
IJGHPC
Keywords
Field
DocType
fault-intolerant system,computing systems,fault-tolerant computing system,abft technique,fault tolerance abft,fault tolerance approach,input parity value,output parity,new algorithm,new approach,high performance computing system,fault tolerance,redundancy,convolution code,error correction
Convolutional code,Computer science,Parallel computing,Error detection and correction,Redundancy (engineering),Fault tolerance,Computing systems
Journal
Volume
Issue
ISSN
4
1
1938-0259
Citations 
PageRank 
References 
8
0.50
15
Authors
3
Name
Order
Citations
PageRank
Hodjat Hamidi1586.65
Abbas Vafaei2617.47
Seyed Amir Hassan Monadjemi3231.76