Title
Handling Persistent States in Process Checkpoint/Restart Mechanisms for HPC Systems
Abstract
Computer clusters are today the reference architecture for high-performance computing. The large number of nodes in these systems induces a high failure rate. This makes fault tolerance mechanisms, e.g. process checkpoint/restart, a required technology to effectively exploit clusters. Most of the process checkpoint/restart implementations only handle volatile states and do not take into account persistent states of applications, which can lead to incoherent application restarts. In this paper, we introduce an efficient persistent state checkpoint/restoration approach that can be interconnected with a large number of file systems. To avoid the performance issues of a stable support relying on synchronous replication mechanisms, we present a failure resilience scheme optimized for such persistent state checkpointing techniques in a distributed environment. First evaluations of our implementation in the kDFS distributed file system show the negligible performance impact of our proposal.
Year
DOI
Venue
2009
10.1109/CCGRID.2009.29
CCGrid
Keywords
Field
DocType
file system,negligible performance impact,hpc systems,high failure rate,performance issue,account persistent state,process checkpoint,persistent state,handling persistent states,efficient persistent state checkpoint,large number,restart mechanisms,failure resilience scheme,computer clusters,distributed file system,fault tolerant,software fault tolerance,registers,distributed architecture,fault tolerance,high performance computing,grid computing,reference architecture,data mining,computer architecture,resilience,writing,probability density function,software maintenance,failure rate
Distributed File System,Grid computing,Supercomputer,Distributed Computing Environment,Computer science,Software fault tolerance,Real-time computing,Fault tolerance,Reference architecture,Computer cluster,Distributed computing
Conference
Citations 
PageRank 
References 
1
0.35
16
Authors
3
Name
Order
Citations
PageRank
Pierre Riteau1767.69
Adrien Lebre2907.45
Christine Morin343534.65