Title
Recomputation Enabled Efficient Checkpointing.
Abstract
Systematic checkpointing of the machine state makes roll-back and restart of execution from a safe state possible upon detection of an error. The energy overhead of checkpointing, however, as incurred by storage and communication of the machine state, grows with the frequency of checkpointing. Amortizing this overhead becomes especially challenging, considering the growth of expected error rates as an artifact of contemporary technology scaling, as checkpointing frequency tends to increase with increasing error rates. At the same time, due to imbalances in technology scaling, recomputing data can become more energy efficient than storing and retrieving precomputed data. Recomputation of data (which otherwise would be read from a checkpoint) can reduce the frequency of checkpointing along with the data size to be checkpointed, and thereby mitigate checkpointing overhead. This paper quantitatively characterizes a recomputation-enabled checkpointing framework which can reduce the storage overhead of checkpointing by up to 23.91%; the performance overhead by up to 11.92%, and the energy overhead by up to 12.53%, respectively.
Year
Venue
Field
2017
arXiv: Distributed, Parallel, and Cluster Computing
Technology scaling,Computer science,Efficient energy use,Amortizing loan,Distributed computing
DocType
Volume
Citations 
Journal
abs/1710.04685
0
PageRank 
References 
Authors
0.34
10
2
Name
Order
Citations
PageRank
Ismail Akturk1326.56
Ulya R. Karpuzcu227722.27