Title
Near-optimal algorithms for shared filter evaluation in data stream systems
Abstract
We consider the problem of evaluating multiple overlapping queries defined on data streams, where each query is a conjunction of multiple filters and each filter may be shared across multiple queries. Efficient support for overlapping queries is a critical issue in the emerging data stream systems, and this is particularly the case when filters are expensive in terms of their computational complexity and processing time. This problem generalizes other well-known problems such as pipelined filter ordering and set cover, and is not only NP-Hard but also hard to approximate within a factor of o(log n) from the optimum, where n is the number of queries. In this paper, we present two near-optimal approximation lgorithms with provably-good performance guarantees for the evaluation of overlapping queries. We present an edge-coverage based Greedy algorithm which achieves an approximation ratio of (1 + log(n) + log(α)), where n is the number of queries and α is the average number of filters in a query. We also present a randomized, fast and easily parallelizable Harmonic algorithm which achieves an approximation ratio of 2β, where β is the maximum number of filters in a query. We have implemented these algorithms in a prototype system, and evaluated their performance using extensive experiments in the context of multimedia stream analysis. The results show that our Greedy algorithm consistently outperforms other known algorithms under various settings and scales well as the numbers of queries and filters increase.
Year
DOI
Venue
2008
10.1145/1376616.1376633
SIGMOD Conference
Keywords
Field
DocType
near-optimal algorithm,approximation ratio,known algorithm,data stream system,shared filter evaluation,log n,filters increase,overlapping query,multiple filter,multiple overlapping query,average number,greedy algorithm,maximum number,set cover,randomized algorithm,computational complexity,query optimization
Query optimization,Randomized algorithm,Binary logarithm,Set cover problem,Data stream mining,Computer science,Data stream,Algorithm,Theoretical computer science,Greedy algorithm,Database,Computational complexity theory
Conference
Citations 
PageRank 
References 
28
1.50
21
Authors
4
Name
Order
Citations
PageRank
Zhen Liu11088102.40
Srinivasan Parthasarathy24666375.76
Anand Ranganathan32696164.67
Hao Yang466048.26