Abstract | ||
---|---|---|
Distributed Shared Virtual Memory (DSVM) systems provide a shared memory abstraction on distributed memory architectures. Such systems ease parallel application programming because the shared-memory programming model is often more natural than the message-passing paradigm. However, the probability of failure of a DSVM increases with the number of sites. Thus, fault tolerance mechanisms must be implemented in order to allow processes to continue their execution in the event of a failure. This paper gives an overview of recoverableDSVMs (RDSVMs) that provide a checkpointing mechanism to restart parallel computations in the event of a site failure. |
Year | DOI | Venue |
---|---|---|
1997 | 10.1109/71.615441 | IEEE Trans. Parallel Distrib. Syst. |
Keywords | Field | DocType |
checkpointing mechanism,parallel application programming,dsvm increase,site failure,fault tolerance mechanism,shared memory abstraction,virtual memory,parallel computation,memory architecture,shared-memory programming model,bit error rate,concurrent computing,hardware,parallel programming,fault tolerant,fault tolerance,parallel computer,message passing,shared memory,availability,writing,distributed systems,programming model | Uniform memory access,Programming paradigm,Shared memory,Computer science,Distributed memory,Data diffusion machine,Real-time computing,Fault tolerance,Distributed shared memory,Memory architecture,Distributed computing | Journal |
Volume | Issue | ISSN |
8 | 9 | 1045-9219 |
Citations | PageRank | References |
36 | 1.62 | 30 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Christine Morin | 1 | 226 | 26.78 |
Isabelle Puaut | 2 | 1708 | 89.84 |