Abstract | ||
---|---|---|
Large scientific applications deployed on current petascale systems expend a significant amount of their execution time dumping checkpoint files to remote storage. New fault tolerant techniques will be critical to efficiently exploit post-petascale systems. In this work, we propose a low-overhead high-frequency multi-level checkpoint technique in which we integrate a highly-reliable topology-aware Reed-Solomon encoding in a three-level checkpoint scheme. We efficiently hide the encoding time using one Fault-Tolerance dedicated thread per node. We implement our technique in the Fault Tolerance Interface FTI. We evaluate the correctness of our performance model and conduct a study of the reliability of our library. To demonstrate the performance of FTI, we present a case study of the Mw9.0 Tohoku Japan earthquake simulation with SPECFEM3D on TSUBAME2.0. We demonstrate a checkpoint overhead as low as 8% on sustained 0.1 petaflops runs (1152 GPUs) while checkpointing at high frequency. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1145/2063384.2063427 | SC |
Keywords | Field | DocType |
three-level checkpoint scheme,low-overhead high-frequency multi-level checkpoint,checkpoint file,execution time,fault tolerance interface fti,hybrid system,new fault tolerant technique,checkpoint overhead,high performance fault tolerance,encoding time,performance model,case study,fault tolerance,user interfaces,writing,fault tolerant,fault tolerant system,computer model,compression,reed solomon,computational modeling,topology,data reduction,encoding,high frequency,data intensive computing | Earthquake simulation,Data-intensive computing,Computer science,Correctness,Parallel computing,Fault tolerance,User interface,Petascale computing,Hybrid system,Embedded system,Distributed computing,Encoding (memory) | Conference |
Citations | PageRank | References |
115 | 3.64 | 27 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Leonardo Bautista-Gomez | 1 | 148 | 11.33 |
Seiji Tsuboi | 2 | 115 | 3.64 |
Dimitri Komatitsch | 3 | 339 | 22.87 |
Franck Cappello | 4 | 3775 | 251.47 |
Naoya Maruyama | 5 | 836 | 55.34 |
Satoshi Matsuoka | 6 | 3773 | 359.36 |