Title | ||
---|---|---|
An Improved Genetic Algorithm for Document Clustering with Semantic Similarity Measure |
Abstract | ||
---|---|---|
This paper proposes a self-organized genetic algorithm for document clustering based on semantic similarity measure. The traditional method to represent text is that the document is organized as a string of words, while the conceptual similarity is ignored. We take advantage of thesaurus-based ontology to overcome this problem. To investigate how ontology method could be used effectively in document clustering, a hybrid strategy which combines the thesaurus-based semantic similarity measure and vector space model (VSM) measure to provide more accurate assessment of similarity between documents are implemented. Considering the influence between the diversity of the population and the selective pressure, an approach of dynamic evolution operators is put forward in this article. In our experiment two data sets of 200 and 600 documents from Reuter-21578 corpus are excerpted for test and the experiment results show that our method of genetic algorithm in conjunction with the hybrid semantic strategy, the combination of the thesaurus-based measure and VSM-based measure, outperforms that with the sole VSM measure. Our clustering algorithm also efficiently enhances the performance of precision and recall in comparison with k-means in the same similarity environments. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1109/ICNC.2008.374 | ICNC |
Keywords | Field | DocType |
document clustering,genetic algorithm,semantic similarity measure,conceptual similarity,sole vsm measure,thesaurus-based measure,similarity environment,improved genetic algorithm,thesaurus-based semantic similarity measure,vsm-based measure,clustering algorithm,clustering algorithms,algorithm design and analysis,self organization,wordnet,clustering,gallium,genetic algorithms,k means,vector space model,semantic similarity | Semantic similarity,Population,Fuzzy clustering,Data mining,Computer science,Document clustering,Precision and recall,Vector space model,Cluster analysis,WordNet | Conference |
Citations | PageRank | References |
0 | 0.34 | 9 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Wei Song | 1 | 113 | 15.51 |
Soon Cheol Park | 2 | 197 | 14.78 |