Abstract | ||
---|---|---|
Uncertainty in categorical data is commonplace in many applications, including data cleaning, database integration, and biological annotation. In such domains, the correct value of an attribute is often unknown, but may be selected from a reasonable number of alternatives. Current database management systems do not provide a convenient means for representing or manipulating this type of uncertainty. In this paper we extend traditional systems to explicitly handle uncertainty in data values. We propose two index structures for efficiently searching uncertain categorical data, one based on the R-tree and another based on an inverted index structure. Using these structures, we provide a detailed description of the probabilistic equality queries they support. Experimental results using real and synthetic datasets demonstrate how these index structures can effectively improve the performance of queries through the use of internal probabilistic information. |
Year | DOI | Venue |
---|---|---|
2007 | 10.1109/ICDE.2007.367907 | 2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3 |
Keywords | Field | DocType |
computer science,relational databases,inverted index,error correction,web pages,tree data structures,data cleaning,database indexing,uncertainty,categorical data,indexation,indexing,database management systems,database integration,database management system,application software,biology,r tree,database systems | Inverted index,Data integration,R-tree,Data mining,Relational database,Categorical variable,Computer science,Search engine indexing,Probabilistic logic,Database index,Database | Conference |
ISSN | Citations | PageRank |
1084-4627 | 69 | 2.61 |
References | Authors | |
20 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sarvjeet Singh | 1 | 338 | 12.79 |
Chris Mayfield | 2 | 335 | 18.86 |
Sunil Prabhakar | 3 | 2664 | 152.75 |
Rahul Shah | 4 | 1059 | 61.31 |
Susanne E. Hambrusch | 5 | 1210 | 102.99 |