Co-Designing Multi-Level Checkpoint Restart for MPI Applications | 1 | 0.36 | 2021 |
Understanding Soft Error Sensitivity of Deep Learning Models and Frameworks through Checkpoint Alteration | 0 | 0.34 | 2021 |
Towards Zero-Waste Recovery and Zero-Overhead Checkpointing in Ensemble Data Assimilation | 0 | 0.34 | 2021 |
Extending the OpenCHK Model with Advanced Checkpoint Features | 0 | 0.34 | 2020 |
Checkpoint Restart Support for Heterogeneous HPC Applications | 1 | 0.36 | 2020 |
Design and Study of Elastic Recovery in HPC Applications | 0 | 0.34 | 2020 |
Checkpoint/Restart Approaches for a Thread-Based MPI Runtime | 1 | 0.35 | 2019 |
Accelerating Hyperparameter Optimisation with PyCOMPSs | 0 | 0.34 | 2019 |
Application-Level Differential Checkpointing for HPC Applications with Dynamic Datasets | 2 | 0.38 | 2019 |
Approximating a Multi-Grid Solver | 0 | 0.34 | 2018 |
On the Applicability of PEBS based Online Memory Access Tracking for Heterogeneous Memory Management at Scale. | 0 | 0.34 | 2018 |
Towards Ad Hoc Recovery for Soft Errors | 0 | 0.34 | 2018 |
Exploring the capabilities of support vector machines in detecting silent data corruptions | 0 | 0.34 | 2018 |
Toward General Software Level Silent Data Corruption Detection for Parallel Applications. | 2 | 0.38 | 2017 |
Portable Topology-Aware MPI-I/O | 0 | 0.34 | 2017 |
Exploring Partial Replication to Improve Lightweight Silent Data Corruption Detection for HPC Applications. | 3 | 0.39 | 2016 |
Unprotected computing: a large-scale study of DRAM raw error rate on a supercomputer. | 11 | 0.55 | 2016 |
Spatial Support Vector Regression to Detect Silent Errors in the Exascale Era | 7 | 0.47 | 2016 |
Coping with recall and precision of soft error detectors. | 1 | 0.35 | 2016 |
Monitoring strategies for scalable dynamic checkpointing | 0 | 0.34 | 2016 |
Which Verification for Soft Error Detection? | 4 | 0.38 | 2015 |
FTI: high performance fault tolerance interface for hybrid systems | 115 | 3.64 | 2011 |