Title
ATLAS: a probabilistic algorithm for high dimensional similarity search
Abstract
Given a set of high dimensional binary vectors and a similarity function (such as Jaccard and Cosine), we study the problem of finding all pairs of vectors whose similarity exceeds a given threshold. The solution to this problem is a key component in many applications with feature-rich objects, such as text, images, music, videos, or social networks. In particular, there are many important emerging applications that require the use of relatively low similarity thresholds. We propose ATLAS, a probabilistic similarity search algorithm that in expectation finds a 1 - δ fraction of all similar vector pairs. ATLAS uses truly random permutations both to filter candidate pairs of vectors and to estimate the similarity between vectors. At a 97.5% recall rate, ATLAS consistently outperforms all state-of-the-art approaches and achieves a speed-up of up to two orders of magnitude over both exact and approximate algorithms.
Year
DOI
Venue
2011
10.1145/1989323.1989428
SIGMOD Conference
Keywords
Field
DocType
feature-rich object,low similarity threshold,probabilistic similarity search algorithm,similarity function,recall rate,random permutation,candidate pair,high dimensional similarity search,probabilistic algorithm,high dimensional binary vector,approximate algorithm,key component,data mining,similarity search,social network
Randomized algorithm,Data mining,Trigonometric functions,Pattern recognition,Computer science,Permutation,Similarity (network science),Artificial intelligence,Jaccard index,Probabilistic logic,Nearest neighbor search,Binary number
Conference
Citations 
PageRank 
References 
17
0.80
26
Authors
3
Name
Order
Citations
PageRank
Jiaqi Zhai1170.80
Yin Lou250628.82
Johannes Gehrke3133621055.06