Title
Distributed Checkpointing Mechanism for a Parallel File System
Abstract
Checkpointing techniques have widely been studied in the literature as a way to recover from failures in sequential, distributed and parallel environments. However, most of the checkpointing mechanisms proposed so far focus only on the recovery of the application data. If the application performs some I/O operations to disk files, such schemes may not work correctly, as they do not provide rollback-recovery for the file contents. In this paper, we present a distributed checkpointing mechanism for a Parallel File System that can be integrated with any of the previous application checkpointing algorithms. Three different file checkpointing schemes will be presented, tested in that mechanism and discussed in detail. The distributed mechanism proposed was integrated in PIOUS - a public-domain parallel file system developed for the PVM distributed computing environment.
Year
Venue
Keywords
2000
PVM/MPI
checkpointing mechanism,previous application,o operation,checkpointing technique,parallel environment,parallel file system,different file,public-domain parallel file system,disk file,application data,file content,distributed computing environment,public domain,fault tolerant
Field
DocType
Volume
File system,Virtual machine,Distributed Computing Environment,Self-certifying File System,Computer science,Parallel computing,Application checkpointing,Fault tolerance,Parallel I/O,Distributed computing
Conference
1908
ISSN
ISBN
Citations 
0302-9743
3-540-41010-4
0
PageRank 
References 
Authors
0.34
9
3
Name
Order
Citations
PageRank
Vítor N. Távora100.34
Luís Moura Silva231236.22
João Gabriel Silva361863.55