Title
One is enough: distributed filtering for duplicate elimination
Abstract
The growth of online services has created the need for duplicate elimination in high-volume streams of events. The sheer volume of data in applications such as pay-per-click clickstream processing, RSS feed syndication and notification services in social sites such Twitter and Facebook makes traditional centralized solutions hard to scale. In this paper, we propose an approach based on distributed filtering. To this end, we introduce a suite of distributed Bloom filters that exploit different ways of partitioning the event space. To address the continuous nature of event delivery, the filters are extended to support sliding window semantics. Moreover, we examine locality-related tradeoffs and propose a tree-based architecture to allow for duplicate elimination across geographic locations. We cast the design space and present experimental results that demonstrate the pros and cons of our various solutions in different settings.
Year
DOI
Venue
2011
10.1145/2063576.2063643
CIKM
Keywords
Field
DocType
rss feed syndication,bloom filter,different way,continuous nature,design space,different setting,duplicate elimination,event delivery,event space,geographic location,difference set,sliding window
Bloom filter,Data mining,Sliding window protocol,Suite,Clickstream,Computer science,Exploit,RSS,Semantics,Web syndication
Conference
Citations 
PageRank 
References 
7
0.42
17
Authors
4
Name
Order
Citations
PageRank
Georgia Koloniari122016.49
Nikos Ntarmos221915.40
evaggelia pitoura31968321.56
Dimitris Souravlias4484.34