Title
Efficient scheduling and execution of scientific workflow tasks
Abstract
Large-scale scientific workflows are often characterized by tasks that produce or consume large amounts of data (frequently both) and generate large volumes of derived data products. Minimizing the end-to-end running time of a set of workflow tasks is important to deliver data products in a timely manner and free up processors to accomodate additional workflows. A single workflow task may perform the same computations on multiple files, presenting many opportunities for concurrent execution on multiple nodes of a Grid. In addition, many different tasks may operate on the same large input files. An important challenge to effi- cient workflow execution on multiple nodes is determining an assignment of tasks to nodes. Processor and network speeds may vary at different times, workflow tasks may be modified, and new workflows may be added. In this paper we examine algorithms for scheduling tasks concurrently on nodes of a dedicated Grid to address these challenges. We use real workflow tasks from the CORIE Environmental Observation and Forecasting System. We propose a hybrid scheduling approach that exploits knowledge of task run- ning times and locations of input files to assign some tasks to nodes statically, while others are assigned dynamically to adapt to variations in task execution times. We show the ef- fectiveness of our approach using both simulations and our prototype implementation.
Year
Venue
Keywords
2005
SSDBM
efficient scheduling,scientific workflow task
Field
DocType
ISBN
Workflow technology,Computer science,Scheduling (computing),Workflow engine,Workflow,Workflow management system,Database
Conference
1-88888-111-X
Citations 
PageRank 
References 
2
0.44
18
Authors
2
Name
Order
Citations
PageRank
Laura Bright117617.34
David Maier256391666.90