Title
Screening for ortholog clusters using multipartite graph clustering by quasi-concave set function optimization
Abstract
Finding orthologous genes, similar genes in different genomes, is a fundamental problem in comparative genomics. We present a model for automatically extracting candidate ortholog clusters in a large set of genomes using a new clustering method for multipartite graphs. The groups of orthologous genes are found by focusing on the gene similarities across genomes rather than similarities between genes within a genome. The clustering problem is formulated as a series of combinatorial optimization problems whose solutions are interpreted as ortholog clusters. The objective function in optimization problem is a quasi-concave set function which can be maximized efficiently. The properties of these functions and the algorithm to maximize these functions are presented. We applied our method to find ortholog clusters in data which supports the manually curated Cluster of Orthologous Genes (COG) from 43 genomes containing 108,090 sequences. Validation of candidate ortholog clusters was by comparison against the manually curated ortholog clusters in COG, and by verifying annotations in Pfam and SCOP – in most cases showing strong correlations with the known results. An analysis of Pfam and SCOP annotations, and COG membership for sequences in 7,701 clusters which include sequences from at least three organisms, shows that 7,474(97%) clusters contain sequences that are all consistent in at least one of the annotations or their COG membership.
Year
DOI
Venue
2005
10.1007/11548706_43
RSFDGrC (2)
Keywords
Field
DocType
combinatorial optimization problem,candidate ortholog cluster,ortholog cluster,cog membership,optimization problem,fundamental problem,quasi-concave set function optimization,multipartite graph,orthologous gene,curated ortholog cluster,different genomes,clustering problem,graph clustering,objective function,comparative genomics
Genome,Set function,Multipartite,Pattern recognition,Computer science,Comparative genomics,Combinatorial optimization,Artificial intelligence,Cog,Cluster analysis,Optimization problem
Conference
Volume
ISSN
ISBN
3642
0302-9743
3-540-28660-8
Citations 
PageRank 
References 
1
0.41
7
Authors
3
Name
Order
Citations
PageRank
Akshay Vashist117612.64
casimir a kulikowski2616299.37
Ilya Muchnik332347.03