Title
An efficient LSH indexing on discriminative short codes for high-dimensional nearest neighbors
Abstract
In massive multimedia era, the dimension curse and the I/O performance bottleneck have become two major challenges for disk-based Approximate Nearest Neighbor (ANN) search. Hashing is a popular solution to overcome the dimension curse, one promising hashing technique is Locality Sensitive Hashing (LSH). However, most existing LSH indexings incur significant I/O cost during the search due to their low NN candidate hits in each I/O access. We recommend a novel method SC-LSH (SortingCodes-LSH) which combines LSH with another hashing technique (i.e., the discriminative short codes) to lift the hit of NN candidates so as to further boost the ANN search performance. Firstly, we intensify an LSH index and sort all the compound hashing keys according to a linear order to make similar NN candidates distributed locally. Then we generate product quantization (PQ) codes to use them as candidates instead of the original data points. These space-efficient short codes can enable us acquire significantly candidates via much less I/O operations. Moreover, based on theoretical and empirical studies among series of space-filling curves, we finally choose the Gray curve as the linear order to produce better local distribution of candidate data. All these above significantly increase the NN hits during each I/O, which greatly reduce the amount of necessary I/O access. Meanwhile, with the good similarity preserving ability, PQ codes are precise enough to discriminate NNs and thus guarantee the accuracy. Empirical study demonstrates that, comparing with four state-of-the-arts, SC-LSH achieves the best accuracy with significantly smaller I/O cost and space consumption. In fact, depending on the datasets, the I/O cost (resp., space consumption) of our scheme is only 5%-20% (resp., 1%-20%) of the other methods.
Year
DOI
Venue
2019
10.1007/s11042-018-6987-0
Multimedia Tools and Applications
Keywords
Field
DocType
Approximate nearest neighbor, Hashing, Locality-sensitive hashing, Discriminative short codes, Linear order
k-nearest neighbors algorithm,Data point,Locality-sensitive hashing,Bottleneck,Pattern recognition,Computer science,sort,Search engine indexing,Hash function,Artificial intelligence,Discriminative model
Journal
Volume
Issue
ISSN
78
17
1573-7721
Citations 
PageRank 
References 
1
0.36
24
Authors
4
Name
Order
Citations
PageRank
Feng Xiaokang110.36
Jiangtao Cui234036.67
Hui Li320234.25
Yingfan Liu4392.72