Title
On High-Assurance Scientific Workflows
Abstract
Scientific Workflow Management Systems (S-WFMS), such as Kepler, have proven to be an important tools in scientific problem solving. Interestingly, S-WFMS fault-tolerance and failure recovery is still an open topic. It often involves classic fault-tolerance mechanisms, such as alternative versions and rollback with re-runs, reliance on the fault-tolerance capabilities provided by subcomponents and lower layers such as schedulers, Grid and cloud resources, or the underlying operating systems. When failures occur at the underlying layers, a workflow system sees this as failed steps in the process, but frequently without additional detail. This limits S-WFMS' ability to recover from failures. We describe a light weight end-to-end S-WFMS fault-tolerance framework, developed to handle failure patterns that occur in some real-life scientific workflows. Capabilities and limitations of the framework are discussed and assessed using simulations. The results show that the solution considerably increase workflow reliability and execution time stability.
Year
DOI
Venue
2011
10.1109/HASE.2011.58
HASE
Keywords
Field
DocType
workflow reliability,scientific problem,failure recovery,underlying operating system,classic fault-tolerance mechanism,failure pattern,high-assurance scientific workflows,s-wfms fault-tolerance,real-life scientific workflows,underlying layer,fault-tolerance capability,software fault tolerance,data model,data models,kepler,middleware,fault tolerant system,operating system,fault tolerant,management system,fault tolerance
Middleware,Workflow technology,Computer science,Software fault tolerance,Real-time computing,Fault tolerance,Workflow management system,Rollback,Workflow,Grid,Distributed computing
Conference
Citations 
PageRank 
References 
4
0.40
20
Authors
2
Name
Order
Citations
PageRank
Mladen A. Vouk145249.92
Pierre A. Mouallem240.40