Title
A MapReduce-based filtering algorithm for vector similarity join
Abstract
Vector Similarity Join is a fundamental operation that is utilized in data cleaning and analysis. Since most objects can be represented as feature vectors, finding similar pairs of objects is quite an important task. However, Vector Similarity Join is a heavy computational job, because its complexity is proportional to the square of the number of vectors. In order to diminish its computational load, many filtering techniques have been proposed so far. In addition to that, algorithms for distributed systems also have been researched to manage large datasets. But, the state-of-the-art studies also suffer from voluminous computations. In this paper, we propose a MapReduce algorithm that efficiently executes Vector Similarity Join. In the first stage of our algorithm, we use prefix filtering to reduce the number of candidate pairs. The second stage calculates similarities from candidate pairs of the first stage. We present candidates quantity prediction formulas to demonstrate the effectiveness of our algorithm. Experimental results show that our algorithm outperforms state-of-the-art MapReduce algorithms.
Year
DOI
Venue
2013
10.1145/2448556.2448627
ICUIMC
Keywords
Field
DocType
computational load,fundamental operation,state-of-the-art study,candidate pair,vector similarity,feature vector,heavy computational job,vector similarity join,mapreduce algorithm,state-of-the-art mapreduce algorithm
Feature vector,Computer science,Algorithm,Filter (signal processing),Theoretical computer science,Prefix,Computation
Conference
Citations 
PageRank 
References 
4
0.43
3
Authors
4
Name
Order
Citations
PageRank
Byoungju Yang1121.31
Jaeseok Myung2816.48
Sang-goo Lee3832151.04
Dongjoo Lee418212.87