Designing Non-Blocking Personalized Collectives With Near Perfect Overlap For Rdma-Enabled Clusters - Citegraph

Paper Info

Title
Designing Non-Blocking Personalized Collectives With Near Perfect Overlap For Rdma-Enabled Clusters

Abstract
Several techniques have been proposed in the past for designing non-blocking collective operations on high-performance clusters. While some of them required a dedicated process/thread or periodic probing to progress the collective others needed specialized hardware solutions. The former technique, while applicable to any generic HPC cluster, had the drawback of stealing CPU cycles away from the compute task. The latter gave near perfect overlap but increased the total cost of the HPC installation due to need for specialized hardware and also had other drawbacks that limited its applicability. On the other hand, the Remote Direct Memory Access technology and high performance networks have been pushing the envelope of HPC performance to multi-petaflop levels. However, no scholarly work exists that explores the impact such RDMA technology can bring to the design of non-blocking collective primitives. In this paper, we take up this challenge and propose efficient designs of personalized non-blocking collective operations on top of the basic RDMA primitives. Our experimental evaluation shows that our proposed designs are able to deliver near perfect overlap of computation and communication for personalized collective operations onmodern HPC systems at scale. At the microbenchmark level, the proposed RDMA-Aware collectives deliver improvements in latency of up to 89 times for MPI Igatherv, 3.71 times forMPI Ialltoall and, 3.23 times for MPI Iscatter over the state-of-the-art designs. We also observe an improvement of up to 19% for the P3DFFT kernel at 8,192 cores on the Stampede supercomputing system at TACC.

Year	DOI	Venue
2015	10.1007/978-3-319-20119-1_31	HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2015
Keywords	Field	DocType
Non-blocking collectives, Remote Direct Memory Access, HPC, InfiniBand	Drawback,Kernel (linear algebra),InfiniBand,Supercomputer,Computer science,Parallel computing,Thread (computing),Remote direct memory access,Instruction cycle,Computation,Distributed computing	Conference
Volume	ISSN	Citations
9137	0302-9743	5
PageRank	References	Authors
0.44	8	8

Authors (8 rows)

Cited by (5 rows)

References (8 rows)

Name	Order	Citations	PageRank
Hari Subramoni	1	466	50.51
Ammar Ahmad Awan	2	91	10.84
Khaled Hamidouche	3	180	19.45
D. Pekurovsky	4	116	9.60
Akshay Venkatesh	5	159	13.36
Sourav Chakraborty	6	381	49.27
Karen Tomko	7	141	13.27
Dhabaleswar K. Panda	8	5366	446.70

1