Title
Clustering algorithms optimizer: a framework for large datasets
Abstract
Clustering algorithms are employed in many bioinformatics tasks, including categorization of protein sequences and analysis of gene-expression data. Although these algorithms are routinely applied, many of them suffer from the following limitations: (i) relying on predetermined parameters tuning, such as a-priori knowledge regarding the number of clusters; (ii) involving nondeterministic procedures that yield inconsistent outcomes. Thus, a framework that addresses these shortcomings is desirable. We provide a data-driven framework that includes two interrelated steps. The first one is SVD-based dimension reduction and the second is an automated tuning of the algorithm's parameter(s). The dimension reduction step is efficiently adjusted for very large datasets. The optimal parameter setting is identified according to the internal evaluation criterion known as Bayesian Information Criterion (BIC). This framework can incorporate most clustering algorithms and improve their performance. In this study we illustrate the effectiveness of this platform by incorporating the standard K-Means and the Quantum Clustering algorithms. The implementations are applied to several gene-expression benchmarks with significant success.
Year
DOI
Venue
2007
10.1007/978-3-540-72031-7_8
ISBRA
Keywords
Field
DocType
gene-expression data,svd-based dimension reduction,algorithms optimizer,quantum clustering algorithm,dimension reduction step,data-driven framework,clustering algorithm,automated tuning,gene-expression benchmarks,large datasets,optimal parameter setting,bayesian information criterion,singular value decomposition,principal component analysis,k means,protein sequence,a priori knowledge,gene expression
Categorization,Singular value decomposition,Data mining,Bayesian information criterion,Dimensionality reduction,Nondeterministic algorithm,Computer science,Implementation,Artificial intelligence,Bioinformatics,Cluster analysis,Machine learning
Conference
Volume
ISSN
Citations 
4463
0302-9743
2
PageRank 
References 
Authors
0.38
13
3
Name
Order
Citations
PageRank
Roy Varshavsky1947.01
David Horn241451.58
Michal Linial31502149.92