Abstract | ||
---|---|---|
Parallel single-level store (PSLS) systems integrate a shared virtual memory and a parallel file system. They provide programmers with a global address space including both memory and file data. PSLS systems implemented in a cluster thus represent a natural support for long-running parallel applications, combining both the natural shared memory programming model and a large and efficient file system.However, the need to tolerate failures in such a system increases with the size of applications. In this paper we present a highly-available parallel single level store system (HA-PSLS), which smoothly integrates a backward error recovery high-availability mechanism into a PSLS system. Our system is able to tolerate multiple transient failures, a single permanent failure, and power cut failures affecting the whole cluster, without requiring any specialized hardware. For this purpose, HA-PSLS relies on a high degree of integration (and reusability) of high-availability and standard features. A prototype integrating our high-availability support has been implemented and we show some performance results in the paper. Copyright (C) 2003 John Wiley Sons, Ltd. |
Year | DOI | Venue |
---|---|---|
2003 | 10.1002/cpe.739 | CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE |
Keywords | DocType | Volume |
parallel single-level store, high-availability, fault tolerance, checkpointing, replication, integration, parallel file systems, shared virtual memory | Journal | 15 |
Issue | ISSN | Citations |
10 | 1532-0626 | 0 |
PageRank | References | Authors |
0.34 | 15 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Anne-Marie Kermarrec | 1 | 6649 | 453.63 |
Christine Morin | 2 | 226 | 26.78 |