Abstract | ||
---|---|---|
Latent Semantic Indexing (LSI) has been widely used in information retrieval due to its efficiency in solving the problems of polysemy and synonymy. However, LSI is notably a computationally intensive process because of the computing complexities of singular value decomposition and filtering operations involved in the process. This paper presents MR-LSI, a Map Reduce based distributed LSI algorithm for scalable information retrieval. The performance of MR-LSI is first evaluated in a small scale experimental cluster environment, and subsequently evaluated in large scale simulation environments. By partitioning the dataset into smaller subsets and optimizing the partitioned subsets across a cluster of computing nodes, the overhead of the MR-LSI algorithm is reduced significantly while maintaining a high level of accuracy in retrieving documents of user interest. A genetic algorithm based load balancing scheme is designed to optimize the performance of MR-LSI in heterogeneous computing environments in which the computing nodes have varied resources. |
Year | Venue | Keywords |
---|---|---|
2014 | COMPUTING AND INFORMATICS | Information retrieval,latent semantic indexing,Map Reduce,load balancing,genetic algorithms |
Field | DocType | Volume |
Singular value decomposition,Latent semantic indexing,Information retrieval,Load balancing (computing),Computer science,Symmetric multiprocessor system,Filter (signal processing),Theoretical computer science,Genetic algorithm,Distributed computing,Scalability,Polysemy | Journal | 33 |
Issue | ISSN | Citations |
2 | 1335-9150 | 0 |
PageRank | References | Authors |
0.34 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yang Liu 0010 | 1 | 0 | 0.34 |
M. Li | 2 | 50 | 3.23 |
Mukhtaj Khan | 3 | 0 | 0.34 |
Man Qi | 4 | 1 | 0.70 |