Title
Using Multirail Networks in High-Performance Clusters
Abstract
Using multiple independent networks (also known as rails) is an emerging technique to overcome bandwidth limitations and enhance fault tolerance of current high-performance clusters. We present an extensive experimental comparison of the behavior of various allocation schemes in terms of bandwidth and latency. We show that striping messages over multiple rails can substantially reduce network latency, depending on average message size, network load, and allocation scheme. The compared methods include a basic round-robin rail allocation, a local-dynamic allocation based on local knowledge, and a dynamic rail allocation that reserves both end-points of a message before sending it. The last method is shown to perform better than the others at higher loads: up to 49% better than local-knowledge allocation and 37% better than the round-robin allocation. This allocation scheme also shows lower latency and it saturates on higher loads (for messages large enough). Most importantly, this proposed allocation scheme scales well with the number of rails and message sizes. In addition we propose a hybrid algorithm that combines the benefits of the local-dynamic for short messages with those of the dynamic algorithm for large messages.
Year
DOI
Venue
2003
10.1002/cpe.725
Concurrency and Computation: Practice and Experience
Keywords
DocType
Volume
communication protocols,parallel architectures.,high-performance clusters,communicationlibraries,various allocation scheme,high-performance interconnection networks,dynamic rail allocation,basic round-robin rail allocation,higher load,messages large enough,proposed allocation scheme scale,rout- ing,allocation scheme,performance evaluation,multirail networks,local-dynamic allocation,round-robin allocation,local-knowledge allocation,fault tolerant,hybrid algorithm,routing,communication protocol,local knowledge,comparative method
Journal
15
Issue
ISSN
ISBN
7-8
1532-0626
0-7695-1116-3
Citations 
PageRank 
References 
24
1.91
6
Authors
5
Name
Order
Citations
PageRank
Salvador Coll160957.12
Eitan Frachtenberg2106085.08
Fabrizio Petrini32050165.82
Adolfy Hoisie41465123.85
Leonid Gurvits5315132.60