Scalable Nearest Neighbor Algorithms for High Dimensional Data - Citegraph

Paper Info

Title
Scalable Nearest Neighbor Algorithms for High Dimensional Data

Abstract
For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.

Year	DOI	Venue
2014	10.1109/TPAMI.2014.2321376	IEEE Trans. Pattern Anal. Mach. Intell.
Keywords	Field	DocType
priority search k-means tree,image matching,scalable nearest neighbor algorithms,big data,trees (mathematics),learning (artificial intelligence),approximate search,flann,multiple hierarchical clustering trees,algorithm configuration,distributed nearest neighbor matching framework,search problems,nearest neighbor search,fast library for approximate nearest neighbors,randomized k-d forest algorithm,computer vision,high dimensional data,high dimensional vectors,machine learning problems,automated configuration procedure,high dimensional feature matching,open source library,vegetation,approximation algorithms,clustering algorithms	k-nearest neighbors algorithm,R-tree,Data mining,Computer science,Ball tree,Best bin first,Algorithm,Nearest neighbor graph,Cover tree,Large margin nearest neighbor,Nearest neighbor search	Journal
Volume	Issue	ISSN
36	11	0162-8828
Citations	PageRank	References
125	2.59	11
Authors
2

Search Limit

100125

Authors (2 rows)

Cited by (100 rows)

References (11 rows)

Name	Order	Citations	PageRank
Marius Muja	1	125	2.59
D. G. Lowe	2	15718	1413.60

1