Title
Efficiently Filtering Duplicates over Distributed Data Streams
Abstract
We study the problem of filtering duplicate items over physically distributed data streams to provide clean data for real-time monitoring applications. Existing approaches only filter local duplicates within each stream, and their space and time costs are hardly feasible for high-speed data streams. Based on the space/time efficient data structure Bloom filter, we propose a novel local filtering algorithm to efficiently filter local duplicates, and then extend it to global duplicates filtering which is never addressed before. To adapt to different additional communication overhead in global duplicates filtering, we present eager and lazy approaches for Bloom filter sharing. Theoretical and experimental results show that our solution can efficiently filter duplicates locally and globally, while the errors are small enough when the arguments are set properly.
Year
DOI
Venue
2008
10.1109/CSSE.2008.1367
CSSE (4)
Keywords
Field
DocType
random process,distributed databases,scattering,data structure,filtering,random processes,adaptive filters,history,bloom filter,space time,data structures
Data structure,Bloom filter,Data mining,Data stream mining,Computer science,Stochastic process,Filter (signal processing),Adaptive filter,Distributed database,Database,Distributed computing
Conference
Volume
Issue
Citations 
4
null
5
PageRank 
References 
Authors
0.48
8
3
Name
Order
Citations
PageRank
Xiao-Wei Wang159659.78
Qiang Zhang28820.16
Yan Jia39315.39