Title
New sampling-based summary statistics for improving approximate query answers
Abstract
In large data recording and warehousing environments, it is often advantageous to provide fast, approximate answers to queries, whenever possible. Before DBMSs providing highly-accurate approximate answers can become a reality, many new techniques for summarizing data and for estimating answers from summarized data must be developed. This paper introduces two new sampling-based summary statistics, concise samples and counting samples, and presents new techniques for their fast incremental maintenance regardless of the data distribution. We quantify their advantages over standard sample views in terms of the number of additional sample points for the same view size, and hence in providing more accurate query answers. Finally, we consider their application to providing fast approximate answers to hot list queries. Our algorithms maintain their accuracy in the presence of ongoing insertions to the data warehouse.
Year
DOI
Venue
1998
10.1145/276304.276334
SIGMOD Conference
Keywords
Field
DocType
additional sample point,summarized data,data warehouse,approximate answer,large data recording,approximate query answer,highly-accurate approximate answer,fast incremental maintenance,data distribution,new sampling-based summary statistic,new technique
Data warehouse,Data mining,Data recording,Information retrieval,Computer science,Incremental maintenance,Reservoir sampling,Sampling (statistics),Summary statistics,Database
Conference
Volume
Issue
ISSN
27
2
0163-5808
ISBN
Citations 
PageRank 
0-89791-995-5
265
51.51
References 
Authors
23
2
Search Limit
100265
Name
Order
Citations
PageRank
Phillip B. Gibbons16863624.14
Y. Matias226751.90