Title | ||
---|---|---|
Designing Non-Blocking Personalized Collectives With Near Perfect Overlap For Rdma-Enabled Clusters |
Abstract | ||
---|---|---|
Several techniques have been proposed in the past for designing non-blocking collective operations on high-performance clusters. While some of them required a dedicated process/thread or periodic probing to progress the collective others needed specialized hardware solutions. The former technique, while applicable to any generic HPC cluster, had the drawback of stealing CPU cycles away from the compute task. The latter gave near perfect overlap but increased the total cost of the HPC installation due to need for specialized hardware and also had other drawbacks that limited its applicability. On the other hand, the Remote Direct Memory Access technology and high performance networks have been pushing the envelope of HPC performance to multi-petaflop levels. However, no scholarly work exists that explores the impact such RDMA technology can bring to the design of non-blocking collective primitives. In this paper, we take up this challenge and propose efficient designs of personalized non-blocking collective operations on top of the basic RDMA primitives. Our experimental evaluation shows that our proposed designs are able to deliver near perfect overlap of computation and communication for personalized collective operations onmodern HPC systems at scale. At the microbenchmark level, the proposed RDMA-Aware collectives deliver improvements in latency of up to 89 times for MPI Igatherv, 3.71 times forMPI Ialltoall and, 3.23 times for MPI Iscatter over the state-of-the-art designs. We also observe an improvement of up to 19% for the P3DFFT kernel at 8,192 cores on the Stampede supercomputing system at TACC. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1007/978-3-319-20119-1_31 | HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2015 |
Keywords | Field | DocType |
Non-blocking collectives, Remote Direct Memory Access, HPC, InfiniBand | Drawback,Kernel (linear algebra),InfiniBand,Supercomputer,Computer science,Parallel computing,Thread (computing),Remote direct memory access,Instruction cycle,Computation,Distributed computing | Conference |
Volume | ISSN | Citations |
9137 | 0302-9743 | 5 |
PageRank | References | Authors |
0.44 | 8 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hari Subramoni | 1 | 466 | 50.51 |
Ammar Ahmad Awan | 2 | 91 | 10.84 |
Khaled Hamidouche | 3 | 180 | 19.45 |
D. Pekurovsky | 4 | 116 | 9.60 |
Akshay Venkatesh | 5 | 159 | 13.36 |
Sourav Chakraborty | 6 | 381 | 49.27 |
Karen Tomko | 7 | 141 | 13.27 |
Dhabaleswar K. Panda | 8 | 5366 | 446.70 |