Abstract | ||
---|---|---|
Fault tolerance is a critical issue in the arena of large-scale computing. The fault-tolerant parallel algorithm (FTPA) is an application-level technique for tolerating hardware failures. FTPA achieves fast failure recovery making use of parallel recomputing. However, it complicates the coding of the application program. This paper uses compiler technology to automate the design of FTPA, and introduces the implementation of a tool called GiFT (Get it Fault-Tolerant). GiFT utilizes the extended data-flow analysis to choose the state needed by failure recovery, exploits the parallel recomputing time model to compute the optimal number of recomputing processes, and uses parallelization technologies to generate parallel recomputing codes. The experimental results show that original MPI programs can be transformed into the FTPA counterparts by GiFT correctly, and the performance of GiFT-generated FTPA programs is comparable to the performance of hand-modified FTPA programs. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1109/ICPADS.2008.89 | ICPADS |
Keywords | Field | DocType |
parallel recomputing,hand-modified ftpa program,parallel recomputing time model,mpi programs,failure recovery,fault-tolerant parallel algorithm,recomputing process,gift-generated ftpa program,automating ftpa implementation,fast failure recovery,ftpa counterpart,parallel recomputing code,fault tolerant,computational modeling,data flow analysis,parallel algorithm,message passing,parallel algorithms,algorithms,fault tolerance | Computer science,Parallel algorithm,Parallel computing,Data-flow analysis,Real-time computing,Coding (social sciences),Compiler,Exploit,Fault tolerance,Time model,Message passing,Distributed computing | Conference |
Citations | PageRank | References |
0 | 0.34 | 10 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hongyi Fu | 1 | 68 | 12.50 |
Yunfei Du | 2 | 72 | 14.62 |
Panfeng Wang | 3 | 34 | 6.12 |
Jia Jia | 4 | 36 | 4.01 |
Xuejun Yang | 5 | 678 | 73.26 |