Title
An adaptive checkpointing protocol to bound recovery time with message logging
Abstract
Numerous mathematical approaches have been proposed to determine the optimal checkpoint interval for minimizing total execution time of an application in the presence of failures. These solutions are often not applicable due to the lack of accurate data on the probability distribution of failures. Most current checkpoint libraries require application users to define a fixed time interval for checkpointing. The checkpoint interval usually implies the approximate maximum recovery time for single process applications. However, actual recovery time can be much smaller when message logging is used. Due to this faster recovery, checkpointing may be more frequent than needed and thus unnecessary execution overhead is introduced. In this paper, an adaptive checkpointing protocol is developed to accurately enforce the user-defined recovery time and to reduce excessive checkpoints. An adaptive protocol has been implemented and evaluated using a receiver-based message logging algorithm on wired and wireless mobile networks. The results show that the protocol precisely maintains the user-defined maximum recovery times for several traces with varying message exchange rates. The mechanism incurs lour overhead, avoids unnecessary checkpointing, and reduces failure free execution time
Year
DOI
Venue
1999
10.1109/RELDIS.1999.805100
SRDS
Keywords
Field
DocType
user-defined maximum recovery time,user-defined recovery time,fixed time interval,failure free execution time,fault tolerant computing,optimal checkpoint interval,total execution time,failure free execution,recovery time,message logging,system recovery,faster recovery,avoids unnecessary checkpointing,adaptive checkpointing protocol,bound recovery time,adaptive checkpointing,approximate maximum recovery time,actual recovery time,probability distribution,random processes,application software,wireless application protocol,failure analysis,mathematical model
Fixed time,Wireless,Computer science,Message logging,System recovery,Real-time computing,Probability distribution,Execution time,Distributed computing
Conference
ISSN
ISBN
Citations 
1060-9857
0-7695-0290-3
7
PageRank 
References 
Authors
0.56
29
3
Name
Order
Citations
PageRank
Kuo-Feng Ssu146537.52
Bin Yao2402.53
W. Kent Fuchs31469279.02