Title
Exploring many task computing in scientific workflows
Abstract
One of the main advantages of using a scientific workflow management system (SWfMS) to orchestrate data flows among scientific activities is to control and register the whole workflow execution. The execution of activities within a workflow with high performance computing (HPC) presents challenges in SWfMS execution control. Current solutions leave the scheduling to the HPC queue system. Since the workflow execution engine does not run on remote clusters, SWfMS are not aware of the parallel strategy of the workflow execution. Consequently, remote execution control and provenance registry of the parallel activities is very limited from the SWfMS side. This work presents a set of components to be included on the workflow specification of any SWMfS to control parallelization of activities as MTC. In addition, these components can gather provenance data during remote workflow execution. Through these MTC components, the parallelization strategy can be registered and reused, and provenance data can be uniformly queried. We have evaluated our approach by performing parameter sweep parallelization in solving the incompressible 3D Navier-Stokes equations. Experimental results show the performance gains with the additional benefits of distributed provenance support.
Year
DOI
Venue
2009
10.1145/1646468.1646470
SC-MTAGS
Keywords
Field
DocType
data flow,parallelization,management system,computational fluid dynamics
Many-task computing,Workflow technology,Supercomputer,Computer science,Scheduling (computing),Queue,Real-time computing,Workflow engine,Workflow,Workflow management system,Distributed computing
Conference
Citations 
PageRank 
References 
21
0.91
25
Authors
8