Title
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.
Abstract
Motivation: Biologists often employ clustering techniques in the explorative phase of microarray data analysis to discover relevant biological groupings. Given the availability of numerous clustering algorithms in the machine-learning literature, an user might want to select one that performs the best for his/her data set or application. While various validation measures have been proposed over the years to judge the quality of clusters produced by a given clustering algorithm including their biological relevance, unfortunately, a given clustering algorithm can perform poorly under one validation measure while outperforming many other algorithms under another validation measure. A manual synthesis of results from multiple validation measures is nearly impossible in practice, especially, when a large number of clustering algorithms are to be compared using several measures. An automated and objective way of reconciling the rankings is needed. Results: Using a Monte Carlo cross-entropy algorithm, we successfully combine the ranks of a set of clustering algorithms under consideration via a weighted aggregation that optimizes a distance criterion. The proposed weighted rank aggregation allows for a far more objective and automated assessment of clustering results than a simple visual inspection. We illustrate our procedure using one simulated as well as three real gene expression data sets from various platforms where we rank a total of eleven clustering algorithms using a combined examination of 10 different validation measures. The aggregate rankings were found for a given number of clusters k and also for an entire range of k.
Year
DOI
Venue
2007
10.1093/bioinformatics/btm158
BIOINFORMATICS
Keywords
Field
DocType
cross entropy,microarray data analysis,visual inspection,machine learning,monte carlo
Data mining,Fuzzy clustering,CURE data clustering algorithm,Computer science,Artificial intelligence,Cluster analysis,Single-linkage clustering,Canopy clustering algorithm,Data stream clustering,Correlation clustering,Determining the number of clusters in a data set,Bioinformatics,Machine learning
Journal
Volume
Issue
ISSN
23
13
1367-4803
Citations 
PageRank 
References 
29
2.13
10
Authors
3
Name
Order
Citations
PageRank
Vasyl Pihur1443.63
Susmita Datta237823.64
Somnath Datta339330.33