Title
Time-Interval Sampling for Improved Estimations in Data Warehouses
Abstract
In large data warehouses it is possible to return very fast approximate answers to user queries using pre-computed sampling summaries well-fit for all types of exploration analysis. However, their usage is constrained by the fact that there must be a representative number of samples in grouping intervals to yield acceptable accuracy. In this paper we propose and evaluate a technique that deals with the representation issue by using time interval-biased stratified samples (TISS). The technique is able to deliver fast accurate analysis to the user by taking advantage of the importance of the time dimension in most user analysis. It is designed as a transparent middle layer, which analyzes and rewrites the query to use a summary instead of the base data warehouse. The estimations and error bounds returned using the technique are compared to those of traditional sampling summaries, to show that it achieves significant improvement in accuracy.
Year
DOI
Venue
2002
10.1007/3-540-46145-0_32
DaWaK
Keywords
Field
DocType
base data warehouse,acceptable accuracy,large data warehouse,pre-computed sampling summaries well-fit,time dimension,exploration analysis,user analysis,data warehouses,traditional sampling summary,approximate answer,accurate analysis,time-interval sampling,improved estimations,data warehouse,stratified sampling
Data warehouse,User analysis,Data mining,Database query,Information retrieval,Computer science,Sampling (statistics),Stratified sampling,Database,Multiple time dimensions
Conference
ISBN
Citations 
PageRank 
3-540-44123-9
6
0.51
References 
Authors
7
2
Name
Order
Citations
PageRank
Pedro Furtado120455.67
João Pedro Costa26411.99