Title
Fault Tolerance in Distributed Shared Memory Multiprocessors
Abstract
Massively parallel systems represent a new challenge for fault tolerance. The designers of such systems cannot expect that no parts of the system will fail. With the significant increase in the complexity and number of components the chance of a single or multiple failure is no longer negligible. It is clear that the redundancy, reconfigurability and diagnosis techniques must be incorporated at the design stage itself and not as a subsequent add-on. In this paper we discuss the fault tolerance techniques developed for MEMSY, a massively parallel architecture. These techniques can, in principle, be easily transferred to other distributed shared memory multiprocessors.
Year
DOI
Venue
1993
10.1007/3-540-57307-0_24
Parallel Computer Architectures
Keywords
Field
DocType
fault tolerance,shared memory multiprocessors,fault tolerant,distributed shared memory
Supercomputer architecture,Uniform memory access,Shared memory,Massively parallel,Computer science,Parallel computing,Distributed memory,Cache-only memory architecture,Fault tolerance,Distributed shared memory,Distributed computing
Conference
ISBN
Citations 
PageRank 
3-540-57307-0
6
0.73
References 
Authors
13
8
Name
Order
Citations
PageRank
Mario Dal Cin128240.09
A. Crygier2151.72
H. Hessenauer3151.72
U. Hildebrand4182.26
J. Hönig5131.28
Wolfgang Hohl6659.25
Edgar Michel771.07
András Pataricza851455.25