Title
When Is ''Nearest Neighbor'' Meaningful?
Abstract
We explore the effect of dimensionality on the "nearest neighbor" problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10-15 dimensions. These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10-15) dimensionality!
Year
DOI
Venue
1999
10.1007/3-540-49257-7_15
ICDT
Keywords
Field
DocType
high-dimensional indexing technique,database literature,farthest data point,nearest neighbor,high-dimensional indexing,synthetic data set,broad set,high-dimensional workloads,nearest data point,dimensionality increase,synthetic data,independent and identically distributed
k-nearest neighbors algorithm,Data mining,Indexation,Linear Scan,Computer science,Search engine indexing,Curse of dimensionality,Independent and identically distributed random variables,Synthetic data sets,Nearest neighbor search
Conference
Volume
ISSN
ISBN
1540
0302-9743
3-540-65452-6
Citations 
PageRank 
References 
913
73.28
26
Authors
4
Search Limit
100913
Name
Order
Citations
PageRank
Kevin S. Beyer11702130.28
Jonathan Goldstein21686142.21
Raghu Ramakrishnan3126492243.05
Uri Shaft41050107.01