Title
Static Analysis for Application-Level Checkpointing of MPI Programs
Abstract
Application-level checkpointing is a promising technology in the domain of large-scale scientific computing. The consistency of global checkpoint must be carefully guaranteed in order to correctly restore the computation. Usually, some complex coordinated protocols are employed to ensure the consistency of global checkpoint, which require logging orphan or in-transit messages during checkpointing. These protocols complicate the recovery of the computation and increase the checkpoint overhead due to logging message. In this paper, a new method which ensures the consistency of global checkpoint by static analysis is proposed. The method identifies the safe checkpointing regions in MPI programs, where the global checkpoint is always strongly consistent. All checkpoints are located in those safe checkpoint regions. During checkpointing, the method will not log any messages and introduce no extra overhead. The method was implemented and integrated into ALEC, which is a source-to-source precompiler for automating application-level checkpointing. The experimental results show that our method is effective.
Year
DOI
Venue
2008
10.1109/HPCC.2008.39
HPCC
Keywords
Field
DocType
application-level checkpointing,large-scale scientific computing,static analysis,global checkpoint,safe checkpointing region,safe checkpoint region,mpi program,in-transit message,extra overhead,mpi programs,new method,protocols,message passing,algorithms,scientific computing,fault tolerance
Computer science,Static analysis,Parallel computing,Real-time computing,Fault tolerance,Message passing,Distributed computing,Computation
Conference
Citations 
PageRank 
References 
3
0.40
9
Authors
5
Name
Order
Citations
PageRank
Panfeng Wang1346.12
Yunfei Du27214.62
Hongyi Fu36812.50
Xuejun Yang467873.26
Haifang Zhou5359.33