Title
Identifying Degree and Sources of Non-Determinism in MPI Applications Via Graph Kernels
Abstract
As the scientific community prepares to deploy an increasingly complex and diverse set of applications on exascale platforms, the need to assess reproducibility of simulations and identify the root causes of reproducibility failures increases correspondingly. One of the greatest challenges facing reproducibility issues at exascale is the inherent non-determinism at the level of inter-process communication. The use of non-deterministic communication constructs is necessary to boost performance, but communication non-determinism can also hamper software correctness and result reproducibility. To address this challenge, we propose a software framework for identifying the percentage and sources of communication non-determinism. We model parallel executions as directed graphs and leverage graph kernels to characterize run-to-run variations in inter-process communication. We demonstrate the effectiveness of graph kernel similarity as a proxy for non-determinism, by showing that these kernels can quantify the type and degree of non-determinism present in communication patterns. To demonstrate our framework's ability to link and quantify runtime non-determinism to root sources, demonstrate with present for an adaptive mesh refinement application, where our framework automatically quantifies the impact of function calls on non-determinism, and a Monte Carlo application, where our framework automatically quantifies the impact of parameter configurations on non-determinism.
Year
DOI
Venue
2021
10.1109/TPDS.2021.3081530
IEEE Transactions on Parallel and Distributed Systems
Keywords
DocType
Volume
Non-determinism,reproducibility,debugging,trace analysis,graph similarity
Journal
32
Issue
ISSN
Citations 
12
1045-9219
1
PageRank 
References 
Authors
0.37
0
4
Name
Order
Citations
PageRank
Dylan Chapp110.37
Nigel Tan210.71
Sanjukta Bhowmick310.37
michela taufer435253.04