Title
Topology Agnostic Dynamic Quick Reconfiguration for Large-Scale Interconnection Networks
Abstract
Toleration of faults in the interconnection networks is of vital importance in to days huge computer installations. Still, the existing solutions are short of being satisfactory. They require that the system defaults into a routing algorithm that is inferior to the original, either in terms of performance, or in terms of the need for virtual channels, or both. Furthermore, since support for dynamic reconfiguration is not supported in current hardware, existing methods require the system to be halted while reconfiguration takes place in order to avoid deadlocks. In this paper we present a method that efficiently generates a new routing function in the presence of faults. The new routing function only reroutes the traffic that is affected by the fault, so that the performance of the original routing function is preserved to the extent possible. No specific functionality in the switches is required, we only require exactly the same number of virtual channels in the presence of faults as the original routing algorithm did. Finally, the new routing function is compatible with the old one, so that deadlock free dynamic transition between the old and the new routing function is immediately available. This means that our solution can easily be implemented on current InfiniBand platforms, e.g. through the OFED software stack. We demonstrate that the method is workable for meshes, tori and fat-trees, and that it is able to guarantee one-fault tolerance for all of these topologies.
Year
DOI
Venue
2012
10.1109/CCGrid.2012.62
Cluster, Cloud and Grid Computing
Keywords
Field
DocType
current hardware,virtual channel,deadlock free dynamic transition,new routing function,original routing function,original routing algorithm,dynamic reconfiguration,topology agnostic dynamic quick,large-scale interconnection networks,existing solution,current infiniband platform,routing algorithm,routing,mesh topology,network topology,reconfiguration,fault tolerance,fault tolerant,topology,hpc
Link-state routing protocol,Multipath routing,Dynamic Source Routing,Static routing,Computer science,Enhanced Interior Gateway Routing Protocol,Destination-Sequenced Distance Vector routing,Computer network,Real-time computing,Routing table,Geographic routing,Distributed computing
Conference
ISBN
Citations 
PageRank 
978-1-4673-1395-7
11
0.53
References 
Authors
17
2
Name
Order
Citations
PageRank
Frank Olaf Sem-Jacobsen1667.64
Olav Lysne279754.53