Title
Estimating statistical aggregates on probabilistic data streams
Abstract
The probabilistic-stream model was introduced by Jayram et al. [20].It is a generalization of the data stream model that issuited to handling "probabilistic" data, where each item of the stream represents a probability distribution over a set of possible events. Therefore, a probabilistic stream determines a distribution over apotentially exponential number of classical "deterministic" streams where each item is deterministically one of the domain values. Designing efficient aggregation algorithms for probabilistic data is crucial for handling uncertainty in data-centric applications such as OLAP. Such algorithms are also useful in a variety of other setting including analyzing search engine traffic and aggregation in sensor networks. We present algorithms for computing commonly used aggregates ona probabilistic stream. We present the first one pass streaming algorithms for estimating the expected mean of a probabilistic stream, improving upon results in [20]. Next, we consider the problem of estimating frequency moments for probabilistic data. We propose a general approach to obtain unbiased estimators working over probabilistic data by utilizing unbiased estimators designed for standard streams. Applying this approach, we extend a classical data stream algorithm to obtain a one-pass algorithm for estimating F2, the second frequency moment. We present the first known streaming algorithms forestimating F0, the number of distinct items on probabilistic streams.Our work also gives an efficient one-pass algorithm for estimatingthe median of a probabilistic stream.
Year
DOI
Venue
2007
10.1145/1412331.1412338
ACM Transactions on Database Systems (TODS)
Keywords
DocType
Volume
probabilistic stream model,unbiased estimator,additional key words and phrases: probabilistic streams,efficient aggregation algorithm,probabilistic data,probabilistic data stream,efficient one-pass algorithm,probabilistic stream,one-pass algorithm,statistical aggregate,frequency moment,frequency moments,median,data stream model,classical deterministic stream,mean,standard stream,olap,probabilistic streams,classical data stream algorithm,search engine,algorithms,theory,probability distribution,sensor network,streaming algorithm
Conference
33
Issue
ISSN
Citations 
4
0362-5915
54
PageRank 
References 
Authors
1.96
23
4
Name
Order
Citations
PageRank
T. S. Jayram1137375.87
Andrew McGregor2541.96
S. Muthukrishnan38025734.98
Erik Vee474743.61