Title
Sample Debiasing in the Themis Open World Database System
Abstract
Open world database management systems assume tuples not in the database still exist and are becoming an increasingly important area of research. We present Themis, the first open world database that automatically rebalances arbitrarily biased samples to approximately answer queries as if they were issued over the entire population. We leverage apriori population aggregate information to develop and combine two different approaches for automatic debiasing: sample reweighting and Bayesian network probabilistic modeling. We build a prototype of Themis and demonstrate that Themis achieves higher query accuracy than the default AQP approach, an alternative sample reweighting technique, and a variety of Bayesian network models while maintaining interactive query response times. We also show that Themis is robust to differences in the support between the sample and population, a key use case when using social media samples.
Year
DOI
Venue
2020
10.1145/3318464.3380606
SIGMOD/PODS '20: International Conference on Management of Data Portland OR USA June, 2020
DocType
ISBN
Citations 
Conference
978-1-4503-6735-6
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Laurel J. Orr1102.53
Magdalena Balazinska24513301.06
Dan Suciu396251349.54