Title
Streaming Algorithms for Estimating High Set Similarities in LogLog Space
Abstract
Estimating set similarity and detecting highly similar sets are fundamental problems in areas such as databases and machine learning. MinHash is a well-known technique for approximating Jaccard similarity of sets and has been successfully used for many applications. Its two compressed versions, $b$<mml:math xmlns:mml=&#34;http://www.w...
Year
DOI
Venue
2021
10.1109/TKDE.2020.2969423
IEEE Transactions on Knowledge and Data Engineering
Keywords
DocType
Volume
Registers,Estimation error,Time complexity,Trajectory,Databases,Machine learning
Journal
33
Issue
ISSN
Citations 
10
1041-4347
0
PageRank 
References 
Authors
0.34
0
8
Name
Order
Citations
PageRank
Yiyan Qi1143.24
Ping-Hui Wang223633.39
Yuanming Zhang354.48
Qiaozhu Zhai4123.75
Chenxu Wang5106.22
Guangjian Tian6144.56
John C.S. Lui73680279.85
X. Guan81169137.97