Title
Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems
Abstract
The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterative workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechansims) that affect the performance of scientific workflows on HPC systems.
Year
DOI
Venue
2016
10.1109/CCGrid.2016.54
2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)
Keywords
Field
DocType
Scientific Workflows,High Performance Computing,Data Analysis
Workflow technology,Iterative and incremental development,Supercomputer,Computer science,Application programming interface,Template,Workflow engine,Workflow,Workflow management system,Distributed computing
Conference
ISSN
ISBN
Citations 
2376-4414
978-1-5090-2454-4
4
PageRank 
References 
Authors
0.44
14
4
Name
Order
Citations
PageRank
Valerie Hendrix141.12
James Fox240.44
Devarshi Ghoshal3578.83
lavanya ramakrishnan471056.18