Title
An Index Scheme for Similarity Search on Cloud Computing using MapReduce over Docker Container
Abstract
We consider the problem of similarity search over the large datasets in the distributed environment. The proposed framework employs the Vp-Tree algorithm that integrated on top of the MapReduce framework to achieve good performance as well as meet the scalability and fault tolerance requirements for the system while data scale up. Since VP-Tree algorithm was implemented initially for partition and searching data in the local disk access, we proposed a new approach to using it in the parallel environment. The key point of the Vp-Tree algorithm is that it distributed the similar data points into groups, thereby reducing number of data need to scan during the searching stage. Consequently, the response time of the entire system has been improved. Otherwise, we used an open source computer vision library OpenCV for detect the similarity among images in the dataset. We evaluate the performance of our proposed framework using a synthetic data to show the positive of our approach. The experiment shows that our proposed framework achieves 57% improvement in response time in comparison with running searching job in tradition Hadoop framework. We also compared our application running time on Docker container against VM-based environment. The result points out that deploy our system over Docker container provide higher performance than VM-based environment in term of response time.
Year
DOI
Venue
2016
10.1145/2857546.2857607
IMCOM
Field
DocType
ISBN
Data point,Data mining,Distributed Computing Environment,Computer science,Response time,Synthetic data,Fault tolerance,Nearest neighbor search,Cloud computing,Scalability
Conference
978-1-4503-4142-4
Citations 
PageRank 
References 
5
0.47
8
Authors
6
Name
Order
Citations
PageRank
DT-Tri Nguyen150.47
Chan Ho Yong250.81
Xuan-Qui Pham3335.50
Huu-Quoc Nguyen4173.26
Ton Thi Kim Loan550.47
Eui-Nam Huh61036113.46