Title
Indexing Uncertain Categorical Data
Abstract
Uncertainty in categorical data is commonplace in many applications, including data cleaning, database integration, and biological annotation. In such domains, the correct value of an attribute is often unknown, but may be selected from a reasonable number of alternatives. Current database management systems do not provide a convenient means for representing or manipulating this type of uncertainty. In this paper we extend traditional systems to explicitly handle uncertainty in data values. We propose two index structures for efficiently searching uncertain categorical data, one based on the R-tree and another based on an inverted index structure. Using these structures, we provide a detailed description of the probabilistic equality queries they support. Experimental results using real and synthetic datasets demonstrate how these index structures can effectively improve the performance of queries through the use of internal probabilistic information.
Year
DOI
Venue
2007
10.1109/ICDE.2007.367907
2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3
Keywords
Field
DocType
computer science,relational databases,inverted index,error correction,web pages,tree data structures,data cleaning,database indexing,uncertainty,categorical data,indexation,indexing,database management systems,database integration,database management system,application software,biology,r tree,database systems
Inverted index,Data integration,R-tree,Data mining,Relational database,Categorical variable,Computer science,Search engine indexing,Probabilistic logic,Database index,Database
Conference
ISSN
Citations 
PageRank 
1084-4627
69
2.61
References 
Authors
20
5
Name
Order
Citations
PageRank
Sarvjeet Singh133812.79
Chris Mayfield233518.86
Sunil Prabhakar32664152.75
Rahul Shah4105961.31
Susanne E. Hambrusch51210102.99