Abstract | ||
---|---|---|
Current workflow abstractions in general lack: (a) an adequate approach to handle distributed data and (b) proper separation between logical tasks and data-flow from their mapping onto physical locations. As the complexity and dynamism of data and processing distribution have increased, optimized mapping of logical tasks to physical resources have become a necessity to avoid bottlenecks. We argue that the management of dynamic data and compute should become part of the runtime system of workflow engines to enable workflows to scale as necessary to address big data challenges and fully exploit distributed computing infrastructures (DCI). In this paper we explore how the P* model for pilot-abstractions, which proposes a clear separation between the logical compute and data units and their realization as a job or a file in some physical resource, could provide these capabilities for such a runtime environment. The Pilot-API provides a general-purpose interface to pilot-abstractions and the ability to assign compute and data resources to them. We share our experience of using the case study of a DNA sequencing pipeline, to re-implement the workflow using the Pilot-API. This first exercise, which resulted in a running application that is discussed here, illustrates the potential of this API to address (a) and (b). Our initial results indicate that the pilot abstractions (as captured by the P* model)offer an interesting approach to explore the design of a new generation of workflow management systems and runtime environments that are capable of intelligently deciding on application-aware late binding to physical resources. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1109/CCGrid.2013.17 | Cluster, Cloud and Grid Computing |
Keywords | Field | DocType |
DNA,biology computing,computational complexity,distributed processing,user interfaces,DCI,DNA sequencing pipeline,P* model,Pilot-API,application-aware late binding,data complexity,data dynamism,distributed computing infrastructures,distributed data,dynamic scientific workflows enactment,general-purpose interface,logical tasks,pilot-abstractions,processing distribution,runtime environments,workflow abstractions,workflow engines,workflow management systems,big data,distributed computing,pilot jobs,scientific workflows | Late binding,Workflow technology,Computer science,Windows Workflow Foundation,Dynamic data,Workflow engine,Workflow management system,Workflow,Runtime system,Distributed computing | Conference |
ISSN | ISBN | Citations |
2376-4414 | 978-1-4673-6465-2 | 1 |
PageRank | References | Authors |
0.35 | 17 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mark Santcroos | 1 | 70 | 8.11 |
Barbera van Schaik | 2 | 21 | 2.28 |
Shayan Shahand | 3 | 38 | 6.31 |
Sílvia Delgado Olabarriaga | 4 | 105 | 17.01 |
André Luckow | 5 | 84 | 10.58 |
Shantenu Jha | 6 | 188 | 32.40 |
van Schaik, B.D.C. | 7 | 1 | 0.35 |