Abstract | ||
---|---|---|
Large-scale scientific workflows are often characterized by tasks that produce or consume large amounts of data (frequently both) and generate large volumes of derived data products. Minimizing the end-to-end running time of a set of workflow tasks is important to deliver data products in a timely manner and free up processors to accomodate additional workflows. A single workflow task may perform the same computations on multiple files, presenting many opportunities for concurrent execution on multiple nodes of a Grid. In addition, many different tasks may operate on the same large input files. An important challenge to effi- cient workflow execution on multiple nodes is determining an assignment of tasks to nodes. Processor and network speeds may vary at different times, workflow tasks may be modified, and new workflows may be added. In this paper we examine algorithms for scheduling tasks concurrently on nodes of a dedicated Grid to address these challenges. We use real workflow tasks from the CORIE Environmental Observation and Forecasting System. We propose a hybrid scheduling approach that exploits knowledge of task run- ning times and locations of input files to assign some tasks to nodes statically, while others are assigned dynamically to adapt to variations in task execution times. We show the ef- fectiveness of our approach using both simulations and our prototype implementation. |
Year | Venue | Keywords |
---|---|---|
2005 | SSDBM | efficient scheduling,scientific workflow task |
Field | DocType | ISBN |
Workflow technology,Computer science,Scheduling (computing),Workflow engine,Workflow,Workflow management system,Database | Conference | 1-88888-111-X |
Citations | PageRank | References |
2 | 0.44 | 18 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Laura Bright | 1 | 176 | 17.34 |
David Maier | 2 | 5639 | 1666.90 |