Title | ||
---|---|---|
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. |
Abstract | ||
---|---|---|
Motivation: Biologists often employ clustering techniques in the explorative phase of microarray data analysis to discover relevant biological groupings. Given the availability of numerous clustering algorithms in the machine-learning literature, an user might want to select one that performs the best for his/her data set or application. While various validation measures have been proposed over the years to judge the quality of clusters produced by a given clustering algorithm including their biological relevance, unfortunately, a given clustering algorithm can perform poorly under one validation measure while outperforming many other algorithms under another validation measure. A manual synthesis of results from multiple validation measures is nearly impossible in practice, especially, when a large number of clustering algorithms are to be compared using several measures. An automated and objective way of reconciling the rankings is needed. Results: Using a Monte Carlo cross-entropy algorithm, we successfully combine the ranks of a set of clustering algorithms under consideration via a weighted aggregation that optimizes a distance criterion. The proposed weighted rank aggregation allows for a far more objective and automated assessment of clustering results than a simple visual inspection. We illustrate our procedure using one simulated as well as three real gene expression data sets from various platforms where we rank a total of eleven clustering algorithms using a combined examination of 10 different validation measures. The aggregate rankings were found for a given number of clusters k and also for an entire range of k. |
Year | DOI | Venue |
---|---|---|
2007 | 10.1093/bioinformatics/btm158 | BIOINFORMATICS |
Keywords | Field | DocType |
cross entropy,microarray data analysis,visual inspection,machine learning,monte carlo | Data mining,Fuzzy clustering,CURE data clustering algorithm,Computer science,Artificial intelligence,Cluster analysis,Single-linkage clustering,Canopy clustering algorithm,Data stream clustering,Correlation clustering,Determining the number of clusters in a data set,Bioinformatics,Machine learning | Journal |
Volume | Issue | ISSN |
23 | 13 | 1367-4803 |
Citations | PageRank | References |
29 | 2.13 | 10 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Vasyl Pihur | 1 | 44 | 3.63 |
Susmita Datta | 2 | 378 | 23.64 |
Somnath Datta | 3 | 393 | 30.33 |