Abstract | ||
---|---|---|
Scientific Workflow Management Systems (S-WFMS), such as Kepler, have proven to be an important tools in scientific problem solving. Interestingly, S-WFMS fault-tolerance and failure recovery is still an open topic. It often involves classic fault-tolerance mechanisms, such as alternative versions and rollback with re-runs, reliance on the fault-tolerance capabilities provided by subcomponents and lower layers such as schedulers, Grid and cloud resources, or the underlying operating systems. When failures occur at the underlying layers, a workflow system sees this as failed steps in the process, but frequently without additional detail. This limits S-WFMS' ability to recover from failures. We describe a light weight end-to-end S-WFMS fault-tolerance framework, developed to handle failure patterns that occur in some real-life scientific workflows. Capabilities and limitations of the framework are discussed and assessed using simulations. The results show that the solution considerably increase workflow reliability and execution time stability. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1109/HASE.2011.58 | HASE |
Keywords | Field | DocType |
workflow reliability,scientific problem,failure recovery,underlying operating system,classic fault-tolerance mechanism,failure pattern,high-assurance scientific workflows,s-wfms fault-tolerance,real-life scientific workflows,underlying layer,fault-tolerance capability,software fault tolerance,data model,data models,kepler,middleware,fault tolerant system,operating system,fault tolerant,management system,fault tolerance | Middleware,Workflow technology,Computer science,Software fault tolerance,Real-time computing,Fault tolerance,Workflow management system,Rollback,Workflow,Grid,Distributed computing | Conference |
Citations | PageRank | References |
4 | 0.40 | 20 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mladen A. Vouk | 1 | 452 | 49.92 |
Pierre A. Mouallem | 2 | 4 | 0.40 |