Title
Checkpointing an Recovery of Share Memory Parallel Applications in a Cluster
Abstract
This paper describes issues in the design and implementation of checkpointing and recovery modules for the Kerrighed DSM cluster system. Our design is for a DSM supporting the sequential consistency model. The mechanisms are general enough to be used in a number of differentcheckpointing and recovery protocols. It is designed to support common optimizations for performance suggested inliterature, while staying light-weight during fault-free execution. We also present preliminary performance results ofthe current implementation.
Year
DOI
Venue
2003
10.1109/CCGRID.2003.1199403
CCGrid
Keywords
Field
DocType
sequential consistency model,recovery protocol,share memory parallel applications,recovery module,fault-free execution,common optimizations,kerrighed dsm cluster system,present preliminary performance result,current implementation,fault tolerance,kernel,cluster,protocols,linux,rollback,sequential consistency,distributed shared memory,cluster computing,operating systems,memory management,shared memory
Kerrighed,Sequential consistency,Shared memory,System recovery,Computer science,Parallel computing,Fault free,Real-time computing,Workstation clusters,Distributed shared memory,Rollback,Distributed computing
Conference
ISBN
Citations 
PageRank 
0-7695-1919-9
4
0.53
References 
Authors
10
3
Name
Order
Citations
PageRank
Ramamurthy Badrinath118816.28
Christine Morin243534.65
Geoffroy Vallée312315.62