Title
Impact of the Union and Difference Operations on the Quality of Information Products
Abstract
Information derived from relational databases is routinely used for decision making. However, little thought is usually given to the quality of the source data, its impact on the quality of the derived information, and how this in turn affects decisions. To assess quality, one needs a framework that defines relevant metrics that constitute the quality profile of a relation, and provides mechanisms for their evaluation. We build on a quality framework proposed in prior work, and develop quality profiles for the result of the primitive relational operations Difference and Union. These operations have nuances that make both the classification of the resulting records as well as the estimation of the different classes quite difficult to address, and very different from that for other operations. We first determine how tuples appearing in the results of these operations should be classified as accurate, inaccurate or mismember, and when tuples that should appear do not (called incomplete) in the result. Although estimating the cardinalities of these subsets directly is difficult, we resolve this by decomposing the problem into a sequence of drawing processes, each of which follows a hyper-geometric distribution. Finally, we discuss how decisions would be influenced based on the resulting quality profiles.
Year
DOI
Venue
2009
10.1287/isre.1070.0161
Information Systems Research
Keywords
Field
DocType
quality framework,difference operations,source data,relevant metrics,hyper-geometric distribution,relational databases,different class,information products,prior work,primitive relational operation,resulting quality profile,quality profile,relational data model,quality of information
Data mining,Economics,Data quality,Information retrieval,Relational database,Tuple,Source data,Database marketing,Cardinality,Knowledge management,Relational model,Information quality
Journal
Volume
Issue
ISSN
20
1
1047-7047
Citations 
PageRank 
References 
5
0.46
15
Authors
3
Name
Order
Citations
PageRank
Amir Parssian11205.87
Sumit Sarkar2835260.90
Varghese S. Jacob339234.13