Title
When is nearest neighbors indexable?
Abstract
In this paper, we consider whether traditional index structures are effective in processing unstable nearest neighbors workloads. It is known that under broad conditions, nearest neighbors workloads become unstable–distances between data points become indistinguishable from each other. We complement this earlier result by showing that if the workload for your application is unstable, you are not likely to be able to index it efficiently using (almost all known) multidimensional index structures. For a broad class of data distributions, we prove that these index structures will do no better than a linear scan of the data as dimensionality increases. Our result has implications for how experiments should be designed on index structures such as R-Trees, X-Trees and SR-Trees: Simply put, experiments trying to establish that these index structures scale with dimensionality should be designed to establish cross-over points, rather than to show that the methods scale to an arbitrary number of dimensions. In other words, experiments should seek to establish the dimensionality of the dataset at which the proposed index structure deteriorates to linear scan, for each data distribution of interest; that linear scan will eventually dominate is a given. An important problem is to analytically characterize the rate at which index structures degrade with increasing dimensionality, because the dimensionality of a real data set may well be in the range that a particular method can handle. The results in this paper can be regarded as a step towards solving this problem. Although we do not characterize the rate at which a structure degrades, our techniques allow us to reason directly about a broad class of index structures, rather than the geometry of the nearest neighbors problem, in contrast to earlier work.
Year
DOI
Venue
2005
10.1007/978-3-540-30570-5_11
ICDT
Keywords
Field
DocType
proposed index structure,index structures scale,data point,index structures degrade,index structure,multidimensional index structure,traditional index structure,data distribution,broad class,nearest neighbors indexable,dimensionality increase,indexation,nearest neighbor
Data point,R-tree,Multidimensional index,Nearest neighbour,Linear Scan,Computer science,Curse of dimensionality,Theoretical computer science
Conference
Volume
ISSN
ISBN
3363
0302-9743
3-540-24288-0
Citations 
PageRank 
References 
11
0.58
12
Authors
2
Name
Order
Citations
PageRank
Uri Shaft11050107.01
Raghu Ramakrishnan2126492243.05