Title
Gaussian clusters and noise: an approach based on the minimum description length principle
Abstract
We introduce a well-grounded minimum description length (MDL) based quality measure for a clustering consisting of either spherical or axis-aligned normally distributed clusters and a cluster with a uniform distribution in an axis-aligned rectangular box. The uniform component extends the practical usability of the model e.g. in the presence of noise, and using the MDL principle for the model selection makes comparing the quality of clusterings with a different number of clusters possible. We also introduce a novel search heuristic for finding the best clustering with an unknown number of clusters. The heuristic is based on the idea of moving points from the Gaussian clusters to the uniform one and using MDL for determining the optimal amount of noise. Tests with synthetic data having a clear cluster structure imply that the search method is effective in finding the intuitively correct clustering.
Year
DOI
Venue
2010
10.1007/978-3-642-16184-1_18
Discovery Science
Keywords
Field
DocType
best clustering,gaussian cluster,intuitively correct clustering,uniform distribution,different number,axis-aligned rectangular box,mdl principle,clear cluster structure,minimum description length principle,model selection,uniform component,synthetic data,normal distribution,minimum description length
Cluster (physics),Data mining,Mathematical optimization,Heuristic,Expectation–maximization algorithm,Computer science,Minimum description length,Algorithm,Model selection,Uniform distribution (continuous),Gaussian,Cluster analysis
Conference
Volume
ISSN
ISBN
6332
0302-9743
3-642-16183-9
Citations 
PageRank 
References 
1
0.40
4
Authors
3
Name
Order
Citations
PageRank
Panu Luosto1232.05
Jyrki Kivinen21011351.81
Heikki Mannila365951495.69