Title
Biclustering of microarray data with MOSPO based on crowding distance.
Abstract
High-throughput microarray technologies have generated and accumulated massive amounts of gene expression datasets that contain expression levels of thousands of genes under hundreds of different experimental conditions. The microarray datasets are usually presented in 2D matrices, where rows represent genes and columns represent experimental conditions. The analysis of such datasets can discover local structures composed by sets of genes that show coherent expression patterns under subsets of experimental conditions. It leads to the development of sophisticated algorithms capable of extracting novel and useful knowledge from a biomedical point of view. In the medical domain, these patterns are useful for understanding various diseases, and aid in more accurate diagnosis, prognosis, treatment planning, as well as drug discovery.In this work we present the CMOPSOB (Crowding distance based Multi-objective Particle Swarm Optimization Biclustering), a novel clustering approach for microarray datasets to cluster genes and conditions highly related in sub-portions of the microarray data. The objective of biclustering is to find sub-matrices, i.e. maximal subgroups of genes and subgroups of conditions where the genes exhibit highly correlated activities over a subset of conditions. Since these objectives are mutually conflicting, they become suitable candidates for multi-objective modelling. Our approach CMOPSOB is based on a heuristic search technique, multi-objective particle swarm optimization, which simulates the movements of a flock of birds which aim to find food. In the meantime, the nearest neighbour search strategies based on crowding distance and -dominance can rapidly converge to the Pareto front and guarantee diversity of solutions. We compare the potential of this methodology with other biclustering algorithms by analyzing two common and public datasets of gene expression profiles. In all cases our method can find localized structures related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The mined patterns present a significant biological relevance in terms of related biological processes, components and molecular functions in a species-independent manner.The proposed CMOPSOB algorithm is successfully applied to biclustering of microarray dataset. It achieves a good diversity in the obtained Pareto front, and rapid convergence. Therefore, it is a useful tool to analyze large microarray datasets.
Year
DOI
Venue
2009
10.1186/1471-2105-10-S4-S9
BMC Bioinformatics
Keywords
Field
DocType
biological process,gene expression profiling,bioinformatics,algorithms,heuristic search,cluster analysis,high throughput,drug discovery,microarrays,maximal subgroup,pareto front,gene expression,treatment planning,microarray data
Row,Drug discovery,Microarray,Biology,Microarray analysis techniques,Biclustering,Bioinformatics,Microarray databases,Gene expression profiling,DNA microarray
Journal
Volume
Issue
ISSN
10 Suppl 4
S-4
1471-2105
Citations 
PageRank 
References 
54
1.04
23
Authors
4
Name
Order
Citations
PageRank
Junwan Liu11046.65
Zhoujun Li2964115.99
Xiaohua Hu32819314.15
Yiming Chen418722.75