Bridging workflow and data provenance using strong links - Citegraph

Paper Info

Title
Bridging workflow and data provenance using strong links

Abstract
As scientists continue to migrate their work to computational methods, it is important to track not only the steps involved in the computation but also the data consumed and produced. While this provenance information can be captured, in existing approaches, it often contains only weak references between data and provenance. When data files or provenance are moved or modified, it can be difficult to find the data associated with the provenance or to find the provenance associated with the data. We propose a persistent storage mechanism that manages input, intermediate, and output data files, strengthening the links between provenance and data. This mechanism provides better support for reproducibility because it ensures the data referenced in provenance information can be readily located. Another important benefit of such management is that it allows caching of intermediate data which can then be shared with other users. We present an implemented infrastructure for managing data in a provenance-aware manner and demonstrate its application in scientific projects.

Year	Venue	Keywords
2010	SSDBM	scientific project,provenance-aware manner,persistent storage mechanism,data file,weak reference,better support,strong link,output data file,important benefit,intermediate data,provenance information
Field	DocType	Volume
Data mining,Computer science,Bridging (networking),Provenance,Data file,Workflow,Database,Computation	Conference	6187
ISSN	ISBN	Citations
0302-9743	3-642-13817-9	18
PageRank	References	Authors
1.29	14	6

Authors (6 rows)

Cited by (18 rows)

References (14 rows)

Name	Order	Citations	PageRank
David Koop	1	702	42.47
Emanuele Santos	2	939	52.64
Bela Bauer	3	46	5.00
Matthias Troyer	4	120	19.62
Juliana Freire	5	3956	270.89
Cláudio T. Silva	6	5054	290.90

1