Title
Indexing inexact proximity search with distance regression in pivot space
Abstract
We introduce an inexact indexing scheme where, at index building time, training queries drawn from the database are used to fit one linear regression model for each object to be indexed. The response variable is the distance from the object to the query. The predictor variables are the distances from the query to each of a set of pivot objects. At search time, the models can provide distance estimates or probabilities of inclusion in the correct result, either of which can be used to rank the objects for an inexact search where the true distances are calculated in the resulting order, up to a halting point. To reduce storage requirements, the coefficients can be discretized at the cost of some precision in the promise values. We evaluate our scheme on synthetic and real-world data and compare it to a permutation-based scheme that has been reported to outperform other methods in the same experimental setting. We find that, in several of our experiments, the regression-based distance estimates give better query performance than the permutation-based promise values, in some cases even when the pivot set for the regression-based scheme is reduced in order to make its memory size equal to that of the permutation-based index. Limitations of our scheme include high index building cost and vulnerability to deviation from the model assumptions.
Year
DOI
Venue
2010
10.1145/1862344.1862353
SISAP
Keywords
Field
DocType
indexing inexact proximity search,regression-based scheme,query performance,index building time,distance regression,regression-based distance estimate,high index building cost,permutation-based promise value,inexact indexing scheme,distance estimate,pivot space,permutation-based index,permutation-based scheme,indexation,linear regression model
Discretization,Data mining,Regression,Computer science,Permutation,Search engine indexing,Algorithm,Artificial intelligence,Proximity search,Machine learning,Linear regression
Conference
Citations 
PageRank 
References 
6
0.47
12
Authors
2
Name
Order
Citations
PageRank
Ole Edsberg1514.13
Magnus Lie Hetland2738.04