Recomputation Enabled Efficient Checkpointing. - Citegraph

Paper Info

Title
Recomputation Enabled Efficient Checkpointing.

Abstract
Systematic checkpointing of the machine state makes roll-back and restart of execution from a safe state possible upon detection of an error. The energy overhead of checkpointing, however, as incurred by storage and communication of the machine state, grows with the frequency of checkpointing. Amortizing this overhead becomes especially challenging, considering the growth of expected error rates as an artifact of contemporary technology scaling, as checkpointing frequency tends to increase with increasing error rates. At the same time, due to imbalances in technology scaling, recomputing data can become more energy efficient than storing and retrieving precomputed data. Recomputation of data (which otherwise would be read from a checkpoint) can reduce the frequency of checkpointing along with the data size to be checkpointed, and thereby mitigate checkpointing overhead. This paper quantitatively characterizes a recomputation-enabled checkpointing framework which can reduce the storage overhead of checkpointing by up to 23.91%; the performance overhead by up to 11.92%, and the energy overhead by up to 12.53%, respectively.

Year	Venue	Field
2017	arXiv: Distributed, Parallel, and Cluster Computing	Technology scaling,Computer science,Efficient energy use,Amortizing loan,Distributed computing
DocType	Volume	Citations
Journal	abs/1710.04685	0
PageRank	References	Authors
0.34	10	2

Authors (2 rows)

Cited by (0 rows)

References (10 rows)

Name	Order	Citations	PageRank
Ismail Akturk	1	32	6.56
Ulya R. Karpuzcu	2	277	22.27

1