Title
The Parameter-less Randomized Gravitational Clustering algorithm with online clusters' structure characterization.
Abstract
Although clustering is an unsupervised learning approach, most clustering algorithms require the setting of parameters (such as the number of clusters, minimum density or distance threshold) in advance to work properly. Moreover, discovering an appropriate set of clusters is a difficult task since clusters can have any shape, size and density and is harder in the presence of noise. In fact, the presence of noise can deteriorate the results of many of the clustering techniques that are based on the least squares estimate. This paper presents a data clustering algorithm that does not require a parameter setting process [the Parameter-less Randomized Gravitational Clustering algorithm (Pl-Rgc)] and combines it with a mechanism, based in micro-clusters ideas, for representing a cluster as a set of prototypes. In this way, a set of parameter estimation strategies, previously developed for the Randomized Gravitational Clustering (Rgc), are combined with a newly developed stopping criterion, based on the average number of points merged iteration by iteration, to remove the parameter setting of the Rgc algorithm. The performance of the proposed Pl-Rgc algorithm is evaluated experimentally on two types of synthetic data sets: data sets with Gaussian clusters and with non-parametric clusters and two types of real data sets: five classic machine learning classification data sets and one intrusion detection data set. Our results show that the proposed mechanism is able to deal with noise, finds the appropriated number of clusters and finds an appropriated set of cluster prototypes regardless the type of data is working on.
Year
DOI
Venue
2014
10.1007/s13748-014-0054-5
Progress in AI
Keywords
Field
DocType
Data mining, Data clustering, Online clusters’ characterization, Gravity base data analysis
Data mining,Canopy clustering algorithm,CURE data clustering algorithm,Affinity propagation,Correlation clustering,Computer science,Determining the number of clusters in a data set,Constrained clustering,Artificial intelligence,Cluster analysis,Machine learning,Single-linkage clustering
Journal
Volume
Issue
ISSN
2
4
2192-6360
Citations 
PageRank 
References 
0
0.34
20
Authors
4
Name
Order
Citations
PageRank
Jonatan Gómez124129.70
Elizabeth Leon2335.26
Olfa Nasraoui31515164.53
Fabián Giraldo410.69