Title
Declarative rules for inferring fine-grained data provenance from scientific workflow execution traces
Abstract
Fine-grained dependencies within scientific workflow provenance specify lineage relationships between a workflow result and the input data, intermediate data, and computation steps used in the result's derivation. This information is often needed to determine the quality and validity of scientific data, and as such, plays a key role in both provenance standardization efforts and provenance query frameworks. While most scientific workflow systems can record basic information concerning the execution of a workflow, they typically fall into one of three categories with respect to recording dependencies: (1) they rely on workflow computation steps to declare dependency relationships at runtime; (2) they impose implicit assumptions concerning dependency patterns from which dependencies are automatically inferred; or (3) they do not assert any dependency information at all. We present an alternative approach that decouples dependency inference from workflow systems and underlying execution traces. In particular, we present a high-level declarative language for expressing explicit dependency rules that can be applied (at any time) to workflow trace events to generate fine-grained dependency information. This approach not only makes provenance dependency rules explicit, but allows rules to be specified and refined by different users as needed. We present our dependency rule language and implementation that rewrites dependency rules into relational queries over underlying workflow traces. We also demonstrate the language using common types of dependency patterns found within scientific workflows.
Year
DOI
Venue
2012
10.1007/978-3-642-34222-6_7
IPAW
Keywords
Field
DocType
rewrites dependency rule,fine-grained dependency,decouples dependency inference,explicit dependency rule,scientific workflow execution trace,fine-grained dependency information,dependency relationship,dependency pattern,dependency rule language,dependency information,fine-grained data provenance,declarative rule,provenance dependency rule
Data mining,Workflow technology,Dependency information,Programming language,Inference,Computer science,Provenance,Declarative programming,Standardization,Workflow,Database,Computation
Conference
Volume
ISSN
Citations 
7525
0302-9743
12
PageRank 
References 
Authors
0.67
14
3
Name
Order
Citations
PageRank
Shawn Bowers1122386.44
Timothy McPhillips226214.14
Bertram Ludäscher31879239.67