Title
Similarity search in high-dimensional datasets
Abstract
The problem of finding "similar" multimedia objects is a central one, and a popular approach is to represent objects as vectors in a high-dimensional space, and to build a spatial index over a collection of such vectors in order to retrieve the "nearest neighbors" of a query object. There are some fundamental assumptions involved here. First, that the user's notion of similarity can be captured by distance in the space that the vectors are embedded, and second, that nearest neighbors can be efficiently retrieved. In this talk, we discuss these assumptions, based on our experience with the PiQ image database project, carried out at the University of Wisconsin-Madison, and some subsequent work.We will first present a brief overview of the PiQ system and its goal of identifying the DBMS infrastructure required to support image databases, and discuss the role of similarity and nearest-neighbor queries in content-based querying. Next, we consider when the notion of "nearest neighbor" is well-defined in high-dimensional spaces, and when efficient indexing is feasible. The goal is not to suggest that indexing high-dimensional data is impossible, although our results here are mainly negative. Rather, we seek to identify the conditions under which effective indexing and retrieval techniques are feasible, and to identify the key difficulties that must be overcome. Finally, we present some indexing techniques to retrieve nearest neighbors under appropriate conditions, highlighting the role played by redundancy and approximation.
Year
DOI
Venue
2005
10.1145/1160939.1160941
CVDB
Keywords
Field
DocType
piq image database project,effective indexing,high-dimensional datasets,image databases,high-dimensional space,indexing technique,dbms infrastructure,efficient indexing,nearest neighbor,piq system,similarity search,indexing high-dimensional data,indexation,spatial index,high dimensional data
k-nearest neighbors algorithm,R-tree,Data mining,Information retrieval,Best bin first,Computer science,Search engine indexing,Redundancy (engineering),Image database,Spatial database,Nearest neighbor search
Conference
ISBN
Citations 
PageRank 
1-59593-151-1
0
0.34
References 
Authors
1
3
Name
Order
Citations
PageRank
Raghu Ramakrishnan1126492243.05
Jonathan Goldstein21686142.21
Uri Shaft31050107.01