Abstract | ||
---|---|---|
One of the key functionalities provided by Grid systems is the remote execution of applications. This paper introduces a research proposal on fault-tolerance mechanisms for the execution of sequential and message-passing parallel applications on the Grid. A service-based architecture called CPPC-G is proposed. The CPPC (Controller/Precompiler for Portable Checkpointing) framework is used to insert checkpointing instrumentation into the application code. CPPC-G services will be in charge of the submission and monitoring of the application execution, management of checkpoint files generated by CPPC-enabled applications, and detection and automatic restart of failed executions. The development of the CPPC-G architecture will involve research in different areas such as storage and management of data files (checkpoint files); automatic selection of suitable computing resources; reliable detection of execution failures and robustness issues to make the architecture fault-tolerant itself. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1109/CCGRID.2008.38 | CCGrid |
Keywords | Field | DocType |
parallel programming,parallel computation,automatic control,grid computing,computer architecture,parallel computer,fault tolerance,message passing,system monitoring,fault tolerant | Control theory,Grid computing,Computer science,Real-time computing,System monitoring,Robustness (computer science),Fault tolerance,Data file,Message passing,Grid,Distributed computing | Conference |
ISSN | Citations | PageRank |
2376-4414 | 1 | 0.41 |
References | Authors | |
12 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Daniel Díaz | 1 | 43 | 4.19 |
Xoán C. Pardo | 2 | 22 | 5.92 |
María J. Martín | 3 | 174 | 27.68 |
Patricia González | 4 | 78 | 13.06 |