Abstract | ||
---|---|---|
•Presents a scalable asynchronous progress design that requires - No additional software or hardware resources. - No interrupts from the network adapter. - No change in host application code. - No requirement of administrative privileges.•Improves performance of all-to-one, one-to-all, and all-to-all MPI collective communication patterns by up to 60%, 36%, and 50% respectively at 816 processes.•Reduces the runtimes of SPECMPI applications by 41% and a P3DFFT kernel by 63% performance.•Improves the throughput of High Performance Linpack (HPL) application by up to 28%.•Presents large scale evaluations of proposed design against state-of-the-art designs in three MPI libraries including MVAPICH2, Intel MPI, and Open MPI at five variants of many-core architectures including Broadwell, Knights Landing, Skylake, and OpenPOWER with InfiniBand and Omni-Path interconnects. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1016/j.parco.2019.03.003 | Parallel Computing |
Keywords | Field | DocType |
HPC,MPI,Communication computation overlap,Asynchronous progress,Collective operations,Blocking/nonblocking operations,SPECMPI,P3DFFT,HPL | Asynchronous communication,InfiniBand,Computer science,Xeon Phi,Parallel computing,Thread (computing),Throughput,Multi-core processor,Network interface,Performance improvement,Embedded system | Journal |
Volume | ISSN | Citations |
85 | 0167-8191 | 1 |
PageRank | References | Authors |
0.37 | 0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
amit ruhela | 1 | 15 | 1.96 |
Hari Subramoni | 2 | 466 | 50.51 |
Sourav Chakraborty | 3 | 381 | 49.27 |
M. Bayatpour | 4 | 12 | 5.43 |
Pouya Kousha | 5 | 5 | 3.82 |
Dhabaleswar K. Panda | 6 | 5366 | 446.70 |