Title
Out of Many We are One: Measuring Item Batch with Clock-Sketch
Abstract
ABSTRACTItem batch denotes a consecutive sequence of identical items that are close in time in a data stream. It is a useful data stream pattern in cache, burst detection, APT detection, \etc Basic item batch measurement tasks include membership, cardinality, time span and size. Currently, there is no algorithm tailored for item batch measurement. The greatest challenge lies in accurately estimating the time gap between two consecutive identical items. In this paper, we propose Clock-sketch, a framework that introduces the well-known CLOCK algorithm into item batch measurement. The methodology of Clock-sketch is to clean outdated information as much as possible, while guaranteeing that the information of all items visited within the time window $\mathcalT $ is preserved. We conduct experiments on three real-world datasets that feature in item batch pattern. We compare the accuracy and throughput performance of our Clock-sketch against the state-of-the-art and two naive approaches without using Clock-sketch technique. Results of item batch activeness show that Clock-sketch outperforms the state-of-the-art SWAMP in generating 50 times less false positive rate when memory is small. All source codes are open-sourced and released at Github.
Year
DOI
Venue
2021
10.1145/3448016.3452784
International Conference on Management of Data
Keywords
DocType
ISSN
Item Batch, Clock, Sketch, Data stream mining
Conference
0730-8078
Citations 
PageRank 
References 
1
0.35
0
Authors
5
Name
Order
Citations
PageRank
Peiqing Chen110.35
Dong Chen210.35
Lingxiao Zheng310.35
Jin Li483.38
Tong Yang520837.35