Title
An Incremental Prefix Filtering Approach for the All Pairs Similarity Search Problem
Abstract
Given a set of records, a threshold value t and a similarity function, we investigate the problem of finding all pairs of records such that similarity between each pair is above t. We propose several optimizations on the existing approaches to solve the problem. Our algorithm outperforms the state-of-the-art algorithms in the case with large and high-dimensional datasets. The speedup we achieved varied from 30% to 4-x depending on the similarity threshold and the dataset properties.
Year
DOI
Venue
2010
10.1109/APWeb.2010.30
APWeb
Keywords
Field
DocType
optimisation,database management systems,all pairs similarity search problem,state-of-the-art algorithm,existing approach,incremental prefix filtering approach,similarity threshold,search problems,pairs similarity search problem,similarity function,dataset property,optimizations,threshold value,high-dimensional datasets,query formulation,filtering,similarity search,collaboration,indexing,length measurement,databases,optimization,upper bound
Data mining,Upper and lower bounds,Computer science,Length measurement,Search engine indexing,Filter (signal processing),Threshold limit value,Prefix,Nearest neighbor search,Speedup
Conference
ISBN
Citations 
PageRank 
978-1-4244-6600-9
2
0.38
References 
Authors
7
4
Name
Order
Citations
PageRank
Hoang Thanh Lam11088.49
Dinh Viet Dung220.38
Raffaele Perego31471108.91
Fabrizio Silvestri41819107.29