Abstract | ||
---|---|---|
As massively parallel processing (MPP) machines and their associated applications become larger, more work on resiliency is needed if those applications are to have a chance of running for significant lengths of time in the face of the expected component failure rates. This paper describes an approach for protecting large read-mostly in-memory data structures from various forms of failures by applying the concept of software erasure-correcting codes. A prototype library for this scheme was implemented on the Cray XMT and applied to a sample application. It is also portable to other global shared memory architectures that meet certain requirements, including the Cray XE. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1109/IPDPSW.2012.198 | IPDPS Workshops |
Keywords | Field | DocType |
global shared memory architecture,expected component failure rate,cray xe,various failures,prototype library,certain requirement,parallel processing,read-mostly in-memory data structures,large read-mostly in-memory data,associated application,sample application,cray xmt,resilience,databases,registers,data structures,xenon,memory management,erasure codes,face | Psychological resilience,Data structure,Shared memory,Cray XMT,Massively parallel,Computer science,Parallel computing,Software,Memory management,Erasure code,Distributed computing | Conference |
ISSN | Citations | PageRank |
2164-7062 | 1 | 0.36 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Larry Kaplan | 1 | 138 | 8.52 |
Preston Briggs | 2 | 379 | 32.43 |
Miles Ohlrich | 3 | 1 | 0.36 |
Will Leslie | 4 | 1 | 0.36 |