Title
Analysis of Web Clustering Based on Genetic Algorithm with Latent Semantic Indexing Technology
Abstract
This paper constructed a latent semantic text model using genetic algorithm (GA) for web clustering. The main difficulty in the application of GA for text clustering is thousands or even tens of thousands of dimensions in the feature space. Latent semantic indexing (LSI) is a successful technology which attempts to explore the latent semantics structure in textual data, and furthermore, it reduces this large space to smaller one and provides a robust space for clustering. GA belongs to search techniques that efficiently evolve the optimal solution for the problem. Evolved in the reduced latent semantic indexing model, GA can improve clustering accuracy and speed which is typically suitable for real time clustering. We used SSTRESS criteria to analyze the dissimilarity between original term-by-document corpus matrix and the approximate decomposition matrix with different ranks corresponding to the performance of our algorithm evolved in the reduced space. The superiority of GA applied in LSI model over K-means and conventional GA in the vector space model (VSM) is demonstrated by providing good Reuter text clustering results.
Year
DOI
Venue
2007
10.1109/ALPIT.2007.77
ALPIT
Keywords
Field
DocType
latent semantic indexing technology,real time clustering,large space,vector space model,web clusteringgenetic algorithmlatent semantic indexing,robust space,web clustering,genetic algorithm,feature space,conventional ga,clustering accuracy,reduced space,text clustering,algorithm design and analysis,real time,knowledge management,genetic algorithms,space technology,matrix decomposition,clustering algorithms,latent semantic indexing,indexing,k means
Data mining,Fuzzy clustering,Canopy clustering algorithm,CURE data clustering algorithm,Data stream clustering,Pattern recognition,Correlation clustering,Computer science,Probabilistic latent semantic analysis,Artificial intelligence,Constrained clustering,Cluster analysis
Conference
ISBN
Citations 
PageRank 
978-0-7695-2930-1
2
0.41
References 
Authors
6
2
Name
Order
Citations
PageRank
Wei Song111315.51
Soon Cheol Park219714.78