Title | ||
---|---|---|
Distem: Evaluation of Fault Tolerance and Load Balancing Strategies in Real HPC Runtimes through Emulation |
Abstract | ||
---|---|---|
The era of Exascale computing raises new challenges for HPC. Intrinsic characteristics of those extreme scale platforms bring energy and reliability issues. To cope with those constraints, applications will have to be more flexible in order to deal with platform geometry evolutions and unavoidable failures. Thus, to prepare for this upcoming era, a strong effort must be made on improving the HPC software stack. This work focuses on improving the study of a central part of the software stack, the HPC runtimes. To this end we propose a set of extensions to the Distem emulator that enable the evaluation of fault tolerance and load balancing mechanisms in such runtimes. Extensive experimentation showing the benefits of our approach has been performed with three HPC runtimes: Charm++, MPICH, and OpenMPI. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1109/CCGrid.2016.35 | 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) |
Keywords | Field | DocType |
Experimentation,HPC runtimes,Fault tolerance,Load balancing,Emulation | Exascale computing,MPICH,Extreme scale,Load balancing (computing),Computer science,Software fault tolerance,Real-time computing,Software,Fault tolerance,Emulation,Distributed computing | Conference |
ISSN | ISBN | Citations |
2376-4414 | 978-1-5090-2454-4 | 0 |
PageRank | References | Authors |
0.34 | 13 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Cristian Ruiz | 1 | 30 | 2.75 |
Joseph Emeras | 2 | 18 | 2.94 |
Emmanuel Jeanvoine | 3 | 83 | 6.75 |
Lucas Nussbaum | 4 | 145 | 15.18 |