Title
Adaptive Checkpoint Replication for Supporting the Fault Tolerance of Applications in the Grid
Abstract
A major challenge in a dynamic Grid with thousands of machines connected to each other is fault tolerance. The more resources and components involved, themore complicated and error-prone becomes the system. Migol is an adaptive Grid middleware, which addresses the fault tolerance of Grid applications and services by providing the capability to recover applications from checkpoint files automatically. A critical aspect for an automatic recovery is the availability of checkpoint files: If a resource becomes unavailable, it is very likely that the associated storage is also unreachable, e. g. due to a network partition. A strategy to increase the availability of checkpoints isreplication.In this paper, we present the Checkpoint Replication Service. A key feature of this service is the ability to automatically replicate and monitor checkpoints in the Grid.
Year
DOI
Venue
2008
10.1109/NCA.2008.38
Cambridge, MA
Keywords
Field
DocType
fault tolerance,grid application,checkpoint file,critical aspect,adaptive checkpoint replication,automatic recovery,associated storage,checkpoint replication service,adaptive grid middleware,checkpoints isreplication,dynamic grid,availability,bandwidth,fault tolerant,middleware,grid computing,computer networks,replication,application software,software fault tolerance,computer applications
Network partition,Middleware,Grid computing,Computer science,Software fault tolerance,Computer network,Fault tolerance,Computer Applications,Application software,Grid,Distributed computing
Conference
ISBN
Citations 
PageRank 
978-0-7695-3192-2
4
0.44
References 
Authors
14
2
Name
Order
Citations
PageRank
Andre Luckow1382.83
Bettina Schnor214226.36