Title
Robust clustering in high dimensional data using statistical depths.
Abstract
Mean-based clustering algorithms such as bisecting k-means generally lack robustness. Although componentwise median is a more robust alternative, it can be a poor center representative for high dimensional data. We need a new algorithm that is robust and works well in high dimensional data sets e.g. gene expression data.Here we propose a new robust divisive clustering algorithm, the bisecting k-spatialMedian, based on the statistical spatial depth. A new subcluster selection rule, Relative Average Depth, is also introduced. We demonstrate that the proposed clustering algorithm outperforms the componentwise-median-based bisecting k-median algorithm for high dimension and low sample size (HDLSS) data via applications of the algorithms on two real HDLSS gene expression data sets. When further applied on noisy real data sets, the proposed algorithm compares favorably in terms of robustness with the componentwise-median-based bisecting k-median algorithm.Statistical data depths provide an alternative way to find the "center" of multivariate data sets and are useful and robust for clustering.
Year
DOI
Venue
2007
10.1186/1471-2105-8-S7-S8
BMC Bioinformatics
Keywords
Field
DocType
high dimensional data,bioinformatics,algorithms,sample size,microarrays,k means,multivariate data
High dimensional data sets,CURE data clustering algorithm,Clustering high-dimensional data,Computer science,Depth function,Robustness (computer science),Bioinformatics,Cluster analysis,Breakdown point
Journal
Volume
Issue
ISSN
8 Suppl 7
S-7
1471-2105
Citations 
PageRank 
References 
23
0.52
3
Authors
4
Name
Order
Citations
PageRank
Yuanyuan Ding130315.04
Xin Dang21399.85
Hanxiang Peng3522.62
Dawn , Wilkins441527.30