Title
A Scalable Method for Signalling Dynamic Reconfiguration Events with OpenSM
Abstract
Rerouting around faulty components, on-the-fly policy changes, and migration of jobs all require reconfiguration of data structures in the Queue Pairs residing in the hosts on an InfiniBand cluster. In addition to a proper implementation at the host, the subnet manager needs to implement a scalable method for signaling reconfiguration events to the hosts. In this paper we propose and evaluate three different implementations for signalling dynamic reconfiguration events with OpenSM. Through our evaluation we demonstrate a scalable solution for signalling host-side reconfiguration events in an InfiniBand network based on an example where dynamic network reconfiguration combined with a topology-agnostic routing function is used to avoid malfunctioning components. Through measurements on our test-cluster and an analytical study we show that our best proposal reduces reconfiguration latency by more than 90%and in certain situations eliminates it completely. Furthermore, the processing overhead in the subnet manager is shown to be minimal.
Year
DOI
Venue
2011
10.1109/CCGrid.2011.48
CCGrid
Keywords
Field
DocType
infiniband cluster,dynamic network reconfiguration,dynamic reconfiguration event,scalable method,scalable solution,subnet manager,signalling dynamic reconfiguration events,reconfiguration event,host-side reconfiguration event,infiniband network,reconfiguration latency,topology,network topology,fault tolerance,routing,data structure,data structures,fault tolerant
Dynamic network analysis,InfiniBand,Computer science,Queue,Computer network,Network topology,Subnet,Fault tolerance,Control reconfiguration,Distributed computing,Scalability
Conference
Citations 
PageRank 
References 
1
0.36
17
Authors
2
Name
Order
Citations
PageRank
Wei Lin Guay1322.35
Sven-Arne Reinemo218412.64