Abstract | ||
---|---|---|
It has been observed that scaling problems are highly likely to manifest when MPI applications are launched at a large scale where the scale is characterized by the degree of parallelism and the problem size. As the complexity of MPI collectives is directly impacted by both parallelism scale and problem size, their use often triggers scaling problems. Scaling problems' root cause can be outside of MPI libraries and these can be easily exposed via the dynamic interaction between user code and MPI library as the scale goes up. Specifically, irregular collectives suffer the most as the C int displacement array can easily be corrupted with integer overflow. Scaling problems can also result from a bug inside the released MPI libraries due to the lack of a systematic testing of MPI libraries as well as the platform or environment dependency of some scaling problems. Hence it is important for library users to perform testing on their platform to expose potential scaling problems. Fixing a scaling problem is challenging, and thus it usually takes much time for users to wait for an official fix, which sometimes is not even possible due to the difficulty of bug reproduction, root-cause identification, and fix development. To improve users' productivity, we establish the necessity of user side testing and provide a protection layer to avoid scaling problems non-intrusively, i.e., without requiring any changes to the MPI library or user programs. This provides an immediate remedy when an official fix is not readily available. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/IPDPSW.2018.00076 | 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) |
Keywords | Field | DocType |
MPI,Scaling problem,Workaround | Integer overflow,Degree of parallelism,Computer science,Parallel processing,Software bug,Parallel computing,Scaling,Root cause,Distributed computing,Systematic testing | Conference |
ISSN | ISBN | Citations |
2164-7062 | 978-1-5386-5556-6 | 0 |
PageRank | References | Authors |
0.34 | 15 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hongbo Li | 1 | 204 | 30.18 |
Zizhong Chen | 2 | 924 | 69.93 |
rajiv gupta | 3 | 4301 | 364.53 |
Min Xie | 4 | 20 | 7.20 |