Abstract | ||
---|---|---|
Data completeness is an important aspect of data quality. We consider a setting, where databases can be incomplete in two ways: records may be missing and records may contain null values. We (i) formalize when the answer set of a query is complete in spite of such incompleteness, and (ii) we introduce table completeness statements, by which one can express that certain parts of a database are complete. We then study how to deduce from a set of table-completeness statements that a query can be answered completely. Null values as used in SQL are ambiguous. They can indicate either that no attribute value exists or that a value exists, but is unknown. We study completeness reasoning for the different interpretations. We show that in the combined case it is necessary to syntactically distinguish between different kinds of null values and present an encoding for doing that in standard SQL databases. With this technique, any SQL DBMS evaluates complete queries correctly with respect to the different meanings that nulls can carry. We study the complexity of completeness reasoning and provide algorithms that in most cases agree with the worst-case lower bounds. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1145/2396761.2396875 | CIKM |
Keywords | Field | DocType |
different meaning,sql dbms,standard sql databases,table completeness statement,data completeness,complete query,completeness reasoning,different kind,null value,different interpretation,data quality | SQL,Data mining,Data quality,Information retrieval,Computer science,Query by Example,Completeness (statistics),Metadata management,Database,Null (SQL),Encoding (memory) | Conference |
Citations | PageRank | References |
6 | 0.56 | 13 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Werner Nutt | 1 | 2009 | 395.43 |
Simon Razniewski | 2 | 157 | 27.07 |