Title
A MapReduce based distributed LSI
Abstract
Latent Semantic Indexing is a widely used text mining technology nowadays due its effectiveness in dealing with the problems of synonymy and polysemy within a proper matrix scale. However LSI is enormously computationally intensive especially for processing large scale data. And effective solution is to increase the computational power available to LSI using multiple computing nodes. In this paper we propose a novel MapReduce based distributed LSI using Hadoop distributed computing architecture to implement K-means algorithm to cluster the documents and then using LSI on the clustered results. We evaluated the performances of the proposed MapReduce based LSI and comparison are made with standalone LSI. The results show a great improvement of LSI's performance in terms of speed.
Year
DOI
Venue
2010
10.1109/FSKD.2010.5569083
FSKD
Keywords
Field
DocType
mapreduce,latent semantic indexing,pattern clustering,svd,k-mean,hadoop distributed computing architecture,multiple computing nodes,matrix algebra,large scale data processing,text mining technology,indexing,distributed lsi,data mining,text analysis,lsi,distributed computing,clustering algorithms,k means,semantics,matrix decomposition,k means algorithm,k mean,computational modeling,text mining
Data mining,Latent semantic indexing,Matrix algebra,Computer science,Search engine indexing,Artificial intelligence,Cluster analysis,Computer engineering,k-means clustering,Singular value decomposition,Matrix decomposition,Machine learning,Semantics
Conference
Volume
ISBN
Citations 
6
978-1-4244-5931-5
2
PageRank 
References 
Authors
0.41
1
5
Name
Order
Citations
PageRank
yang liu115111.93
Maozhen Li21354183.79
Suhel Hammoud31307.82
nasullah khalid alham41046.80
Mahesh Ponraj5213.55