Title
Compiler-Assisted Application-Level Checkpointing for MPI Programs
Abstract
Application-level checkpointing can decrease the overhead of fault tolerance by minimizing the amount of checkpoint data. However this technique requires the programmer to manually choose the critical data that should be saved. In this paper, we firstly propose a live-variable analysis method for MPI programs. Then, we provide an optimization method of datasaving for application-level checkpointing based on the analysis method. Based on the theoretical foundation, we implement a source-to-source precompiler (ALEC) to automate application-level checkpointing. Finally, we evaluate the performance of five FORTRAN/MPI programs which are transformed and integrated checkpointing features by ALEC on a 512-CPU cluster system. The experimental results show that i) the application-level checkpointing based on live-variable analysis for MPI programs can efficiently reduce the amount of checkpoint data, thereby decrease the overhead of checkpoint and restart; ii) ALEC is capable of automating application-level checkpointing correctly and effectively.
Year
DOI
Venue
2008
10.1109/ICDCS.2008.25
ICDCS
Keywords
Field
DocType
mpi programs,checkpoint data,application-level checkpointing,live-variable analysis method,512-cpu cluster system,live-variable analysis,analysis method,integrated checkpointing feature,critical data,mpi program,optimization method,compiler-assisted application-level checkpointing,scientific computing,message passing,concurrent computing,distributed computing,algorithm design and analysis,algorithms,application software,fault tolerant,high performance computing
Programmer,Algorithm design,Computer science,Parallel computing,Fortran,Compiler,Fault tolerance,Message passing,Operating system,Distributed computing
Conference
ISSN
Citations 
PageRank 
1063-6927
4
0.45
References 
Authors
9
6
Name
Order
Citations
PageRank
Xuejun Yang167873.26
Panfeng Wang2346.12
Hongyi Fu36812.50
Yunfei Du47214.62
Zhiyuan Wang5576.37
Jia Jia6364.01