Abstract | ||
---|---|---|
While there exist many kernel and user level libraries/systems which support checkpointing working processes and resuming their operations, it is still very difficult to provide an easy-to-use tool to assist checkpointing parallel applications. In this work, we aim at the development of an easy-to-use user-guided library to support checkpointing parallel MPI applications to be executed within the CLUSTERIX environment i.e. a collection of distributed HPC clusters. We propose a programmer-assisted approach with process state packing and unpacking at the code level for SPMD HPC applications. Although the library is in its early stage of development we present checkpoint/restart times and application execution (interrupted by checkpointing) times for the proposed approach compared to the same application linked with the ckpt user level library. |
Year | DOI | Venue |
---|---|---|
2004 | 10.1109/PARELEC.2004.72 | PARELEC |
Keywords | Field | DocType |
mpi applications,parallel application,parallel mpi application,spmd hpc application,ckpt user level library,towards easy-to-use checkpointing,hpc cluster,application execution,easy-to-use user-guided library,user level library,easy-to-use tool,code level,parallel programming,message passing | Kernel (linear algebra),SPMD,Process state,Computer science,Parallel computing,Message passing,Unpacking,Operating system | Conference |
ISBN | Citations | PageRank |
0-7695-2080-4 | 1 | 0.37 |
References | Authors | |
5 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Pawel Czarnul | 1 | 121 | 21.11 |
Arkadiusz Urbaniak | 2 | 4 | 0.96 |
Marcin Fraczak | 3 | 1 | 0.37 |
Maciej Dyczkowski | 4 | 1 | 0.37 |
Bartlomiej Balcerek | 5 | 1 | 0.37 |