Title
On the modelling of optimal coordinated checkpoint period in supercomputers.
Abstract
This work revises current assumptions adopted in the checkpointing modelling and evaluates their impact on the attained prediction of the optimal coordinated single-level checkpoint period. An accurate a priori assessment of the optimal checkpoint period for a given computing facility is necessary as it drives the incurred overhead due to frequent checkpointing and, as a result, implies a drop in the resource steady-state availability. The present study discusses the impact of the order of approximation used in the single-level coordinated checkpoint modelling and follows on extending previous results of the optimal checkpoint period to explore the effects of the checkpoint rate on the cluster performance under total execution time and energy consumption policies, and in terms of resource availability. A consequence of a prescribed checkpoint rate with current technology is a critical size of the cluster above which the attained availability is too poor to become a cost-effective platform. Thus, some guidelines for the cluster sizing are indicated.
Year
DOI
Venue
2019
10.1007/s11227-018-2621-1
The Journal of Supercomputing
Keywords
Field
DocType
Coordinated checkpoint, Cluster availability, Optimal checkpoint period, Single-level checkpoint
Computer science,A priori and a posteriori,Execution time,Sizing,Energy consumption,Distributed computing
Journal
Volume
Issue
ISSN
75
2
1573-0484
Citations 
PageRank 
References 
2
0.39
19
Authors
3
Name
Order
Citations
PageRank
José A. Moríñigo132.44
Manuel A. Rodriguez-Pascual2217.57
Rafael Mayo Garcia3365.60