Title
Fast computation of approximate biased histograms on sliding windows over data streams
Abstract
Histograms provide effective synopses of large data sets, and are thus used in a wide variety of applications, including query optimization, approximate query answering, distribution fitting, parallel database partitioning, and data mining. Moreover, very fast approximate algorithms are needed to compute accurate histograms on fast-arriving data streams, whereby online queries can be supported within the given memory and computing resources. Many real-life applications require that the data distribution in certain regions must be modeled with greater accuracy, and Biased Histograms are designed to address this need. In this paper, we define biased histograms over data streams and sliding windows on data streams, and propose the Bar Splitting Biased Histogram (BSBH) algorithm to construct them efficiently and accurately. We prove that BSBH generates expected ∈-approximate biased histograms for data streams with stationary distributions, and our experiments show that BSBH also achieves good approximation in the presence of concept shifts, even major ones. Additionally, BSBH employs a new biased sampling technique which outperforms uniform sampling in terms of accuracy, while using about the same amount of time and memory. Therefore, BSBH outperforms previously proposed algorithms for computing biased histograms over the whole data stream, and it is the first algorithm that supports windows.
Year
DOI
Venue
2013
10.1145/2484838.2484851
SSDBM
Keywords
Field
DocType
approximate query answering,data mining,fast-arriving data stream,large data set,bar splitting biased histogram,fast computation,whole data stream,biased histograms,data distribution,approximate algorithm,data stream,data streams,quantiles
Query optimization,Data mining,Histogram,Data stream mining,Data set,Data stream,Computer science,Parallel database,Distribution fitting,Sampling (statistics),Database
Conference
Citations 
PageRank 
References 
3
0.41
20
Authors
2
Name
Order
Citations
PageRank
Hamid Mousavi1395.91
Carlo Zaniolo243051447.58