Title
Overcoming extreme-scale reproducibility challenges through a unified, targeted, and multilevel toolset
Abstract
Reproducibility, the ability to repeat program executions with the same numerical result or code behavior, is crucial for computational science and engineering applications. However, non-determinism in concurrency scheduling often hampers achieving this ability on high performance computing (HPC) systems. To aid in managing the adverse effects of non-determinism, prior work has provided techniques to achieve bit-precise reproducibility, but most of them focus only on small-scale parallelism. While scalable techniques recently emerged, they are disparate and target special purposes, e.g., single-schedule domains. On current systems with O(106) compute cores and future ones with O(109), any technique that does not embrace a unified, targeted, and multilevel approach will fall short of providing reproducibility. In this paper, we argue for a common toolset that embodies this approach, where programmers select and compose complementary tools and can effectively, yet scalably, analyze, control, and eliminate sources of non-determinism at scale. This allows users to gain reproducibility only to the levels demanded by specific code development needs. We present our research agenda and ongoing work toward this goal.
Year
DOI
Venue
2013
10.1145/2532352.2532357
SE-HPCCSE@SC
Field
DocType
Citations 
Computational Science and Engineering,Reproducibility,Extreme scale,Software engineering,Supercomputer,Computer science,Concurrency,Scheduling (computing),Software requirements specification,Scalability
Conference
4
PageRank 
References 
Authors
0.41
11
6
Name
Order
Citations
PageRank
Dong H. Ahn132522.61
Gregory L. Lee219914.30
Ganesh Gopalakrishnan31619130.11
Zvonimir Rakamarić41357.41
Martin Schulz52227129.64
Ignacio Laguna623924.56