Title
A framework for fine-grained data integration and curation, with provenance, in a dataspace
Abstract
Some tasks in a dataspace (a loose collection of heterogeneous data sources) require integration of fine-grained data from diverse sources. This work is often done by end users knowledgeable about the domain, who copy-and-paste data into a spreadsheet or other existing application. Inspired by this kind of work, in this paper we define a data curation setting characterized by data that are explicitly selected, copied, and then pasted into a target dataset where they can be confirmed or replaced. Rows and columns in the target may also be combined, for example, when redundant. Each of these actions is an integration decision, often of high quality, that when taken together comprise the provenance of a data value in the target. In this paper, we define a conceptual model for data and provenance for these user actions, and we show how questions about data provenance can be answered. We note that our model can be used in automated data curation as well as in a setting with the manual activity we emphasize in our examples.
Year
Venue
Keywords
2009
Workshop on the Theory and Practice of Provenance
fine-grained data integration,integration decision,automated data,conceptual model,data provenance,end user,heterogeneous data source,data value,diverse source,target dataset,fine-grained data,data integrity
Field
DocType
Citations 
Data integration,World Wide Web,Conceptual model,Information retrieval,End user,Computer science,Data curation,Provenance
Conference
10
PageRank 
References 
Authors
0.54
13
3
Name
Order
Citations
PageRank
David W. Archer1505.28
Lois M. L. Delcambre2992420.78
David Maier356391666.90