Abstract | ||
---|---|---|
Efficient reduction algorithms are crucial to many large-scale, parallel scientific applications. While previous algorithms constrain processing to the host CPU, we explore and utilise the processors in modern cluster Network Interface Cards (NICs). We present the design issues, solutions, analytical models, and experimental evaluations of a family of NIC-based reduction algorithms. Through experiments on the ALC cluster at Lawrence Livermore National Laboratory, which connects 960 dual-CPU nodes with the Quadrics QsNet interconnect, we find NIC-based reductions to be more efficient than host-based implementations. At large-scale, our NIC-based reductions are more than twice as fast as the host-based, production-level MPI implementation. |
Year | DOI | Venue |
---|---|---|
2006 | 10.1504/IJHPCN.2006.010635 | IJHPCN |
Keywords | Field | DocType |
modern cluster,analytical model,cluster computing,aleece,quadrics qsnet,efficient reduction algorithm,nic-based operations,reduce,nic-based reduction algorithm,lawrence livermore national laboratory,host-based implementation,nic-based reduction,collective communication,alc cluster,network interface cards,large-scale cluster,network interface card,floating point,standard deviation,linux cluster | Cluster (physics),Network clustering,Computer science,Parallel computing,Collective communication,Algorithm,Implementation,Interconnection,Computer cluster,Distributed computing | Journal |
Volume | Issue | Citations |
4 | 3/4 | 5 |
PageRank | References | Authors |
0.51 | 21 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Fabrizio Petrini | 1 | 2050 | 165.82 |
Adam Moody | 2 | 431 | 24.18 |
Juan Fernandez | 3 | 269 | 23.17 |
Eitan Frachtenberg | 4 | 1060 | 85.08 |
Dhabaleswar K. Panda | 5 | 5366 | 446.70 |