Title
Hybrid approach for visualization of documents clusters using GHSOM and sammon projection
Abstract
This paper presents the hybrid approach for visualization of documents sets by the combination of hierarchical clustering method, based on the Growing Hierarchical Self-Organizing Maps algorithm, and Sammon projection. Algorithms based on the self-organizing maps provide robust clustering method suitable for visualization of larger number of documents into the grid-based 2D maps. Sammon projection is nonlinear projection method suitable mostly to visualization of smaller sets of object on (usually 2D) maps based on the projections. Here we have implemented and tested combination of these approaches, where starting set of documents is organized using GHSOM to subsets of similar documents, then for clusters at the end of clustering phase, with smaller number of inputs, Sammon maps are created in order to provide distinction also for documents in these clusters. The method for extraction of characteristic terms based on the information gain analysis was used for description of clusters. Existing library JBOWL was used for implementation of the hybrid algorithm. For testing purposes, the documents in English language were used.
Year
DOI
Venue
2013
10.1109/SACI.2013.6608994
Applied Computational Intelligence and Informatics
Keywords
Field
DocType
data visualisation,document handling,information analysis,natural language processing,pattern clustering,self-organising feature maps,English language,GHSOM,Sammon projection,clustering phase,documents cluster visualization,grid-based 2D maps,growing hierarchical self-organizing maps algorithm,hierarchical clustering method,hybrid algorithm,information gain analysis,library JBOWL,nonlinear projection method,robust clustering method
Hierarchical clustering,Sammon mapping,Data mining,Canopy clustering algorithm,Fuzzy clustering,CURE data clustering algorithm,Pattern recognition,Correlation clustering,Computer science,Artificial intelligence,Cluster analysis,Brown clustering
Conference
ISBN
Citations 
PageRank 
978-1-4673-6397-6
1
0.35
References 
Authors
8
2
Name
Order
Citations
PageRank
Peter Butka1418.44
Jana Pócsová2336.02