Abstract | ||
---|---|---|
Massively parallel systems represent a new challenge for fault tolerance. The designers of such systems cannot expect that no parts of the system will fail. With the significant increase in the complexity and number of components the chance of a single or multiple failure is no longer negligible. It is clear that the redundancy, reconfigurability and diagnosis techniques must be incorporated at the design stage itself and not as a subsequent add-on. In this paper we discuss the fault tolerance techniques developed for MEMSY, a massively parallel architecture. These techniques can, in principle, be easily transferred to other distributed shared memory multiprocessors. |
Year | DOI | Venue |
---|---|---|
1993 | 10.1007/3-540-57307-0_24 | Parallel Computer Architectures |
Keywords | Field | DocType |
fault tolerance,shared memory multiprocessors,fault tolerant,distributed shared memory | Supercomputer architecture,Uniform memory access,Shared memory,Massively parallel,Computer science,Parallel computing,Distributed memory,Cache-only memory architecture,Fault tolerance,Distributed shared memory,Distributed computing | Conference |
ISBN | Citations | PageRank |
3-540-57307-0 | 6 | 0.73 |
References | Authors | |
13 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mario Dal Cin | 1 | 282 | 40.09 |
A. Crygier | 2 | 15 | 1.72 |
H. Hessenauer | 3 | 15 | 1.72 |
U. Hildebrand | 4 | 18 | 2.26 |
J. Hönig | 5 | 13 | 1.28 |
Wolfgang Hohl | 6 | 65 | 9.25 |
Edgar Michel | 7 | 7 | 1.07 |
András Pataricza | 8 | 514 | 55.25 |