Abstract | ||
---|---|---|
The efficient implementation of collective communication patterns in a parallel machine is a challenging design effort, that requires the solution of many problems. In this paper we present an in-depth description of how the Quadrics network supports both hardware- and software-based collectives. We describe the main features of the two building blocks of this network, a network interface that can perform zero-copy user-level communication and a wormhole routing switch. We also focus our attention on the routing and flow control algorithms, deadlock avoidance and on how the processing nodes are integrated in a global, virtual shared memory.Experimental results conducted on 64-node AlphaServer cluster indicate that the time to complete the hardware-based barrier synchronization on the whole network is as low as 6 microsecs, with very good scalability. Good latency and scalability are also achieved with the software-based synchronization, which takes about 15 microsecs. With the broadcast, similar performance is achieved by the hardware- and software-based implementations, which can deliver messages of up to 256 bytes in 13 microsecs and can get a sustained asymptotic bandwidth of 288 i Mbytes/sec on all the nodes.The hardware-based barrier is almost insensitive to the network congestion, with 93% of the synchronizations taking less than 20 microsecs when the network is flooded with a background traffic of unicast messages. On the other hand, the software-based implementation suffers from a significant performance degradation. With high load the hardware broadcast maintains a reasonably good latency, delivering messages up to 2KB in 200 microsecs, while the software broadcast suffers from slightly higher latencies inherited from the synchronization mechanism. Both broadcast algorithms experience a significative performance degradation of the sustained bandwidth with large messages. |
Year | DOI | Venue |
---|---|---|
2001 | 10.1109/NCA.2001.962513 | Cambridge, MA |
Keywords | Field | DocType |
software-based implementation,network congestion,quadrics network,whole network,software-based collective communication,good latency,hardware broadcast,software broadcast,broadcast algorithm,network interface,software-based collective,processing,communications,design,implementation,degradation,synchronization,algorithms,routing,scalability,bandwidth,network interfaces,latency,performance,switches,concurrency control,broadcasting,network design,computer networks,flow control | Synchronization,Network planning and design,Concurrency control,Computer science,Deadlock,Computer network,Network congestion,Unicast,Computer hardware,Network interface,Distributed computing,Scalability | Conference |
ISBN | Citations | PageRank |
0-7695-1432-4 | 27 | 4.41 |
References | Authors | |
10 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Fabrizio Petrini | 1 | 2050 | 165.82 |
Salvador Coll | 2 | 609 | 57.12 |
Eitan Frachtenberg | 3 | 1060 | 85.08 |
Adolfy Hoisie | 4 | 1465 | 123.85 |