Title
Cooperative rendezvous protocols for improved performance and overlap.
Abstract
With the emergence of larger multi-/many-core clusters and new areas of HPC applications, performance of large message communication is becoming more important. MPI libraries use different rendezvous protocols to perform large message communication. However, existing rendezvous protocols do not take the overall communication pattern into account or make optimal use of the Sender and the Receiver CPUs. In this work, we propose a cooperative rendezvous protocol that can provide up to 2x improvement in intra-node bandwidth and latency for large messages. We also propose designs to dynamically choose the best rendezvous protocol for each message based on the overall communication pattern. Finally, we show how these improvements can increase the overlap of intra-node communication and computation with inter-node communication and lead to application level benefits at scale. We evaluate the proposed designs on three different architectures - Intel Xeon, Knights Landing, and OpenPOWER against state-of-the-art MPI libraries including MVAPICH2 and Open MPI. Compared to existing designs, the proposed designs show benefits of up to 19% with Graph500, 16% with CoMD, and 10% with MiniGhost.
Year
DOI
Venue
2018
10.1109/SC.2018.00031
SC
Keywords
Field
DocType
Protocols,Receivers,Libraries,Peer-to-peer computing,Hardware,Computer architecture,Runtime
Latency (engineering),Computer science,Peer to peer computing,Computer network,Communication source,Bandwidth (signal processing),Rendezvous,Xeon,Graph500,Computation,Distributed computing
Conference
ISBN
Citations 
PageRank 
978-1-5386-8384-2
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Sourav Chakraborty138149.27
M. Bayatpour2125.43
Jahanzeb Maqbool Hashmi3427.43
Hari Subramoni446650.51
Dhabaleswar K. Panda55366446.70