Title
Designing Scalable and High-Performance MPI Libraries on Amazon Elastic Fabric Adapter
Abstract
Amazon has recently announced a new network interface named Elastic Fabric Adapter (EFA) targeted towards tightly coupled HPC workloads. In this paper, we characterize the features, capabilities and performance of the adapter. We also explore how its transport models such as UD and SRD (Scalable Reliable Datagram) impact the design of high-performance MPI libraries. Our evaluations show that hardware level reliability provided by SRD can significantly improve the performance of MPI communication. We also propose a new zero-copy transfer mechanism over unreliable and orderless channels that can reduce the communication latency of large messages. The proposed design also shows significant improvement in collective and application performance against the vendor provided MPI library.
Year
DOI
Venue
2019
10.1109/HOTI.2019.00023
2019 IEEE Symposium on High-Performance Interconnects (HOTI)
Keywords
DocType
ISSN
Elastic Fabric Adapter,EFA,SRD,EC2,MPI,HPC
Conference
1550-4794
ISBN
Citations 
PageRank 
978-1-7281-5526-5
0
0.34
References 
Authors
7
4
Name
Order
Citations
PageRank
Sourav Chakraborty138149.27
Shulei Xu211.73
Hari Subramoni346650.51
Dhabaleswar K. Panda45366446.70