Abstract | ||
---|---|---|
We present the Global View Resilience (GVR) system, a library that enables applications to add resilience in a portable, application-controlled fashion using versioned distributed arrays. We briefly describe GVR's interfaces for distributed arrays, versioning, and cross-layer error recovery. We illustrate how GVR can be used for rollback recovery and a wide range additional error recovery techniques including forward recovery for latent errors or silent data corruptions. Application results demonstrate that GVR's interfaces and implementation are portable, flexible (support a variety of recovery models), efficient and create a gentle-slope path to tolerate growing error rates in future systems. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1109/CLUSTER.2015.88 | Cluster Computing |
Keywords | Field | DocType |
Resilience,Fault tolerance,Exascale,Scalable computing,Application-based fault tolerance | Psychological resilience,Forward error correction,Monte Carlo method,Computer science,Parallel computing,Real-time computing,Fault tolerance,Rollback recovery,Distributed computing,Software versioning,Scalable computing | Conference |
ISSN | Citations | PageRank |
1552-5244 | 1 | 0.38 |
References | Authors | |
2 | 9 |
Name | Order | Citations | PageRank |
---|---|---|---|
Nan Dun | 1 | 41 | 5.93 |
Hajime Fujita | 2 | 36 | 5.29 |
Aiman Fang | 3 | 7 | 0.81 |
Yan Liu | 4 | 1 | 0.38 |
Andrew A. Chien | 5 | 3696 | 405.97 |
Pavan Balaji | 6 | 1475 | 111.48 |
Kamil Iskra | 7 | 642 | 46.46 |
Wesley Bland | 8 | 3 | 0.76 |
Andrew R. Siegel | 9 | 42 | 7.33 |