Title
Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures
Abstract
This paper proposes a self-organized genetic algorithm for text clustering based on ontology method. The common problem in the fields of text clustering is that the document is represented as a bag of words, while the conceptual similarity is ignored. We take advantage of thesaurus-based and corpus-based ontology to overcome this problem. However, the traditional corpus-based method is rather difficult to tackle. A transformed latent semantic indexing (LSI) model which can appropriately capture the associated semantic similarity is proposed and demonstrated as corpus-based ontology in this article. To investigate how ontology methods could be used effectively in text clustering, two hybrid strategies using various similarity measures are implemented. Experiments results show that our method of genetic algorithm in conjunction with the ontology strategy, the combination of the transformed LSI-based measure with the thesaurus-based measure, apparently outperforms that with traditional similarity measures. Our clustering algorithm also efficiently enhances the performance in comparison with standard GA and k-means in the same similarity environments.
Year
DOI
Venue
2009
10.1016/j.eswa.2008.12.046
Expert Syst. Appl.
Keywords
Field
DocType
various semantic similarity measure,ontology strategy,genetic algorithm,conceptual similarity,wordnet,various similarity measure,traditional similarity measure,similarity environment,ontology,text clustering,latent semantic indexing,clustering algorithm,corpus-based ontology,ontology method,associated semantic similarity,semantic similarity,k means,self organization,bag of words
Semantic similarity,Canopy clustering algorithm,Data mining,Fuzzy clustering,Ontology-based data integration,Correlation clustering,Document clustering,Computer science,Cluster analysis,WordNet
Journal
Volume
Issue
ISSN
36
5
Expert Systems With Applications
Citations 
PageRank 
References 
40
1.15
19
Authors
3
Name
Order
Citations
PageRank
Wei Song111315.51
Cheng Hua Li219712.83
Soon Cheol Park319714.78