Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. - Citegraph

Paper Info

Title
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.

Abstract
Motivation: Biologists often employ clustering techniques in the explorative phase of microarray data analysis to discover relevant biological groupings. Given the availability of numerous clustering algorithms in the machine-learning literature, an user might want to select one that performs the best for his/her data set or application. While various validation measures have been proposed over the years to judge the quality of clusters produced by a given clustering algorithm including their biological relevance, unfortunately, a given clustering algorithm can perform poorly under one validation measure while outperforming many other algorithms under another validation measure. A manual synthesis of results from multiple validation measures is nearly impossible in practice, especially, when a large number of clustering algorithms are to be compared using several measures. An automated and objective way of reconciling the rankings is needed. Results: Using a Monte Carlo cross-entropy algorithm, we successfully combine the ranks of a set of clustering algorithms under consideration via a weighted aggregation that optimizes a distance criterion. The proposed weighted rank aggregation allows for a far more objective and automated assessment of clustering results than a simple visual inspection. We illustrate our procedure using one simulated as well as three real gene expression data sets from various platforms where we rank a total of eleven clustering algorithms using a combined examination of 10 different validation measures. The aggregate rankings were found for a given number of clusters k and also for an entire range of k.

Year	DOI	Venue
2007	10.1093/bioinformatics/btm158	BIOINFORMATICS
Keywords	Field	DocType
cross entropy,microarray data analysis,visual inspection,machine learning,monte carlo	Data mining,Fuzzy clustering,CURE data clustering algorithm,Computer science,Artificial intelligence,Cluster analysis,Single-linkage clustering,Canopy clustering algorithm,Data stream clustering,Correlation clustering,Determining the number of clusters in a data set,Bioinformatics,Machine learning	Journal
Volume	Issue	ISSN
23	13	1367-4803
Citations	PageRank	References
29	2.13	10
Authors
3

Authors (3 rows)

Cited by (29 rows)

References (10 rows)

Name	Order	Citations	PageRank
Vasyl Pihur	1	44	3.63
Susmita Datta	2	378	23.64
Somnath Datta	3	393	30.33

1