Design of Fault Tolerant Pwrake Workflow System Supported by Gfarm File System. - Citegraph

Paper Info

Title
Design of Fault Tolerant Pwrake Workflow System Supported by Gfarm File System.

Abstract
We have been developing a light-weight workflow system called Pwrake to execute data-intensive many-task workflows with the help of high-performance parallel I/O of Gfarm file system. This paper discusses the design of fault tolerance mechanism implemented in Pwrake. To avoid a workflow abort in the occurrence of a worker node failure, Pwrake detects a node failure based on the result of a task retry. To avoid loss of files when a worker node fails, we make use of automatic file replication of Gfarm file system. To resume an interrupted workflow correctly, we introduce a Pwrake option to rename or remove an output file of a failed task. In the experiment, we confirmed that the overhead of Gfarm automatic file replication in workflow execution time is less than 10%, and that workflow continues and returns right results even after the occurrence of an artificial failure in a worker node.

Year	DOI	Venue
2016	10.1109/MTAGS.2016.7	MTAGS@SC
Keywords	DocType	ISBN
Scientific Workflow System,Fault Tolerance,Distributed File System,Many-Task Computing	Conference	978-1-5090-5213-4
Citations	PageRank	References
0	0.34	0
Authors
2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Masahiro Tanaka	1	0	0.34
Osamu Tatebe	2	309	42.94

1