Title | ||
---|---|---|
A Scalable Network-Based Performance Analysis Tool for MPI on Large-Scale HPC Systems |
Abstract | ||
---|---|---|
Studying the interaction among applications, MPI runtimes, and the fabric they run on is critical to understanding application performance. There exists no high-performance and scalable tool that enables understanding this interplay on modern multi-petaflop systems. Designing such a tool is non-trivial and involves multiple components including 1) data profiling/collection from network/MPI library, 2) storing and, 3) rendering the data. Furthermore, achieving this with minimal overhead and scalability is a challenging task. We take up this challenge and propose a high-performance and scalable network-based performance analysis tool for MPI libraries operating on modern networks like InfiniBand and Omni-Path. Our designs facilitate caching and pre-rendering, allowing a cluster with 6,541 nodes, 764 switches and, 16,893 network links renders in just 30 seconds - a 44X speed up over non-prerendered solutions. The proposed lock-free and optimized memory-backed storage design enables the tool to handle over a quarter million inserts into the database every 45 seconds (data from 27,504 switch ports and 104,656 MPI processes). The tool has been successfully deployed and validated on HPC systems at OSC and on Comet at SDSC. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1109/CLUSTER.2017.78 | 2017 IEEE International Conference on Cluster Computing (CLUSTER) |
Keywords | Field | DocType |
Network-Based Performance Analysis,Tool,HPC,MPI | Data collection,InfiniBand,Computer science,Parallel computing,Real-time computing,Data profiling,Rendering (computer graphics),Scalability,Speedup,Distributed computing | Conference |
ISSN | ISBN | Citations |
1552-5244 | 978-1-5386-2327-5 | 1 |
PageRank | References | Authors |
0.35 | 4 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hari Subramoni | 1 | 466 | 50.51 |
Xiaoyi Lu | 2 | 602 | 60.53 |
Dhabaleswar K. Panda | 3 | 5366 | 446.70 |