Title
An Efficient Method of Genetic Algorithm for Text Clustering Based on Singular Value Decomposition
Abstract
In this paper, we propose a method of genetic algorithm (GA) for text clustering based on singular value decomposition technique. The main difficulty in the application of GA to text clustering is its long string representation in high dimensional space. Because the most straightforward and popular approach represents texts with vector space model (VSM), that is, each unique term in the vocabulary represents one dimension. Singular value decomposition (SVD) is a successful technique arising from numerical linear algebra that is used in latent semantic indexing (LSI). Employing the SVD-based document representation, LSI can overcome the problems by using statistically derived conceptual indices instead of individual words and provide a dimension reduced space. Genetic algorithm belongs to search techniques which could automatically exploit the optimal solution for objective or fitness function of an optimization problem. GA can be used in conjunction with the reduced latent semantic structure and improve clustering efficiency and accuracy. Our algorithm is performed on Reuter documents collection. The results show that the performance of SVD-based GA is significantly superior to that of conventional GA in vector space model.
Year
DOI
Venue
2007
10.1109/CIT.2007.197
CIT
Keywords
Field
DocType
latent semantic indexing,pattern clustering,string representation,vector space model,statistical analysis,numerical linear algebra,long string representation,optimization problem,genetic algorithm,svd-based document representation,svd-based ga,genetic algorithms,clustering efficiency,high dimensional space,conventional ga,efficient method,text analysis,reuter document collection,singular value decomposition,text clustering,fitness function
Singular value decomposition,Pattern recognition,Computer science,Document clustering,Fitness function,Artificial intelligence,Vector space model,Cluster analysis,Optimization problem,Numerical linear algebra,Genetic algorithm
Conference
ISBN
Citations 
PageRank 
978-0-7695-2983-7
1
0.35
References 
Authors
11
2
Name
Order
Citations
PageRank
Wei Song111315.51
Soon Cheol Park219714.78