Title
Distributed Set Expression Cardinality Estimation
Abstract
We consider the problem of estimating set-expression cardinality in a distributed streaming environment where rapid update streams originating at remote sites are continually transmitted to a central processing sys- tem. At the core of our algorithmic solutions for answering set-expression cardinality queries are two novel techniques for lowering data communication costs without sacrificing answer precision. Our first technique exploits global knowledge of the distribution of certain frequently occurring stream elements to sig- nificantly reduce the transmission of element state in- formation to the central site. Our second technical con- tribution involves a novel way of capturing the seman- tics of the input set expression in a boolean logic for- mula, and using models (of the formula) to determine whether an element state change at a remote site can affect the set expression result. Results of our experi- mental study with real-life as well as synthetic data sets indicate that our distributed set-expression cardinality estimation algorithms achieve substantial reductions in message traffic compared to naive approaches that pro- vide the same accuracy guarantees.
Year
DOI
Venue
2004
10.1016/B978-012088469-8.50030-9
VLDB
Keywords
Field
DocType
element state information,central processing system,remote site,data communication cost,central site,boolean logic formula,set-expression cardinality query,set-expression cardinality,element state change,set-expression cardinality estimation algorithm,synthetic data
Data mining,State information,Computer science,Cardinality,Exploit,Theoretical computer science,Boolean algebra,Synthetic data sets,Database,Semantics
Conference
ISBN
Citations 
PageRank 
0-12-088469-0
39
1.93
References 
Authors
11
4
Name
Order
Citations
PageRank
Abhinandan Das130416.00
Sumit Ganguly2813236.01
Minos Garofalakis34904664.22
Rajeev Rastogi46151827.22