Title
Dynamic Fault Tolerance with Misrouting in Fat Trees
Abstract
Fault tolerance is critical for efficient utilisation of large computer systems. Dynamic fault tolerance allows the network to remain available through the occurance of faults as opposed to static fault tolerance which requires the network to be halted to reconfigure it. Although dynamic fault tolerance may lead to less efficient solutions than static fault tolerance, it allows for a much higher availability of the system. In this paper we devise a dynamic fault tolerant adaptive routing algorithm for the fat tree, a much used interconnect topology, which relies on misrouting around link faults. We show that we are guaranteed to tolerate any combination of less than num switch ports/2link faults without the need for additional network resources for deadlock freedom. There is also a high probability of tolerating an even larger number of link faults. Simulation results show that network performance degrades very little when faults are dynamically tolerated.
Year
DOI
Venue
2006
10.1109/ICPP.2006.36
ICPP
Keywords
Field
DocType
fault tolerant,system dynamics,network performance,fat tree,adaptive routing
Stuck-at fault,Adaptive routing algorithm,Resource (disambiguation),Interconnect topology,Computer science,Deadlock,Computer network,Fault tolerance,Fat tree,Distributed computing,Network performance
Conference
ISSN
ISBN
Citations 
0190-3918
0-7695-2636-5
4
PageRank 
References 
Authors
0.43
12
4
Name
Order
Citations
PageRank
Frank Olaf Sem-Jacobsen1667.64
Tor Skeie2110374.67
Olav Lysne379754.53
José Duato43481294.85