Title
Supporting automatic recovery in offloaded distributed programming models through MPI-3 techniques.
Abstract
In this paper we describe the design of fault tolerance capabilities for general-purpose offload semantics, based on the OmpSs programming model. Using ParaStation MPI, a production MPI-3.1 implementation, we explore the features that, being standard compliant, an MPI stack must support to provide the necessary fault tolerance guarantees, based on MPI's dynamic process management. Our results, including synthetic benchmarks and applications, reveal low runtime overhead and efficient recovery, demonstrating that the existing MPI standard provided us with sufficient mechanisms to implement an effective and efficient fault-tolerant solution.
Year
DOI
Venue
2017
10.1145/3079079.3079093
ICS
Field
DocType
Citations 
Programming paradigm,Task parallelism,Computer science,Parallel computing,Real-time computing,Fault tolerance,Semantics,Distributed computing
Conference
0
PageRank 
References 
Authors
0.34
22
4
Name
Order
Citations
PageRank
Antonio J. Peña124825.41
Vicenç Beltran28213.74
Carsten Clauss39312.35
Thomas Moschny4374.18