Efficient sampling strategies for relational database operations - Citegraph

Paper Info

Title
Efficient sampling strategies for relational database operations

Abstract
Recently, we have proposed an adaptive, random-sampling algorithm for general query size estimation in databases. In an earlier work we analyzed the asymptotic efficiency and accuracy of the algorithm; in this paper we investigate its practicality as applied to the relational database operations select, project, and join. We extend our previous analysis to provide significantly improved bounds on the amount of sampling necessary for a given level of accuracy. Also, we provide “sanity bounds” to deal with queries for which the underlying data are extremely skewed or the query result is very small. We investigate how the existence of indices can be used to generate more efficient sampling algorithms for the operations of project and join. Finally, we report on the performance of the estimation algorithm, both as implemented in “stand alone” C programs and as implemented in a host language on a commericial relational system.

Year	DOI	Venue
1993	10.1016/0304-3975(93)90224-H	Theor. Comput. Sci.
Keywords	Field	DocType
efficient sampling strategy,relational database operation,relational database	Query optimization,Data mining,Database query,Relational database,Computer science,View,Theoretical computer science,Database design,Sampling (statistics),Gibbs sampling	Journal
Volume	Issue	ISSN
116	1	Theoretical Computer Science
Citations	PageRank	References
56	17.17	27
Authors
4

Authors (4 rows)

Cited by (56 rows)

References (27 rows)

Name	Order	Citations	PageRank
Richard J. Lipton	1	6412	1796.57
Jeffrey F. Naughton	2	8363	1913.71
Donovan A. Schneider	3	1826	694.32
S. Seshadri	4	472	168.68

1