Title | ||
---|---|---|
Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers |
Abstract | ||
---|---|---|
We describe the software architecture, technical features, and performance of TICK (Transparent Incremental Checkpointer at Kernel level), a system-level checkpointer implemented as a kernel thread, specifi- cally designed to provide fault tolerance in Linux clusters. This implementation, based on the 2.6.11 Linux kernel, provides the essential functionality for transparent, highly responsive, and efficient fault tolerance based on full or incremental checkpointing at system level. TICK is completely user-transparent and does not require any changes to user code or system libraries; it is highly responsive: an interrupt, such as a timer interrupt, can trigger a checkpoint in as little as 2.5µs; and it supports incremental and full checkpoints with minimal overhead-less than 6% with full checkpointing to disk performed as frequently as once per minute. |
Year | DOI | Venue |
---|---|---|
2005 | 10.1109/SC.2005.76 | SC |
Keywords | Field | DocType |
software architecture,concurrent computing,fault tolerant,parallel computer,computer aided manufacturing,linux cluster,kernel,linux,fault tolerance | Interrupt,Permission,Computer science,Parallel computing,Thread (computing),Fault tolerance,Timer,Concurrent computing,Software architecture,Operating system,Linux kernel | Conference |
ISBN | Citations | PageRank |
1-59593-061-2 | 79 | 2.93 |
References | Authors | |
17 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Gioiosa, Roberto | 1 | 459 | 31.78 |
José Carlos Sancho | 2 | 382 | 29.97 |
Song Jiang | 3 | 488 | 25.41 |
Fabrizio Petrini | 4 | 2050 | 165.82 |