Abstract | ||
---|---|---|
HPC systems contain an increasing number of components, decreasing the mean time between failures. Checkpoint mechanisms help to overcome such failures for long-running applications. A viable solution to remove the resulting pressure from the I/O backends is to deduplicate the checkpoints. However, there is little knowledge about the potential to save I/Os for HPC applications by using deduplication within the checkpointing process. In this paper, we perform a broad study about the deduplication behavior of HPC application checkpointing and its impact on system design. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1109/CLUSTER.2016.32 | 2016 IEEE International Conference on Cluster Computing (CLUSTER) |
Keywords | Field | DocType |
deduplication,checkpointing | Data deduplication,Mean time between failures,Computer science,Parallel computing,Systems design,Application checkpointing,Real-time computing,Redundancy (engineering),Operating system,Scalability,Distributed computing | Conference |
ISSN | ISBN | Citations |
1552-5244 | 978-1-5090-3654-7 | 1 |
PageRank | References | Authors |
0.35 | 22 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jürgen Kaiser | 1 | 42 | 3.99 |
Ramy Gad | 2 | 10 | 2.35 |
Tim Süß | 3 | 7 | 0.93 |
Federico Padua | 4 | 10 | 1.29 |
Lars Nagel | 5 | 76 | 13.58 |
André Brinkmann | 6 | 403 | 34.79 |