Title | ||
---|---|---|
Screening for ortholog clusters using multipartite graph clustering by quasi-concave set function optimization |
Abstract | ||
---|---|---|
Finding orthologous genes, similar genes in different genomes, is a fundamental problem in comparative genomics. We present a model for automatically extracting candidate ortholog clusters in a large set of genomes using a new clustering method for multipartite graphs. The groups of orthologous genes are found by focusing on the gene similarities across genomes rather than similarities between genes within a genome. The clustering problem is formulated as a series of combinatorial optimization problems whose solutions are interpreted as ortholog clusters. The objective function in optimization problem is a quasi-concave set function which can be maximized efficiently. The properties of these functions and the algorithm to maximize these functions are presented. We applied our method to find ortholog clusters in data which supports the manually curated Cluster of Orthologous Genes (COG) from 43 genomes containing 108,090 sequences. Validation of candidate ortholog clusters was by comparison against the manually curated ortholog clusters in COG, and by verifying annotations in Pfam and SCOP – in most cases showing strong correlations with the known results. An analysis of Pfam and SCOP annotations, and COG membership for sequences in 7,701 clusters which include sequences from at least three organisms, shows that 7,474(97%) clusters contain sequences that are all consistent in at least one of the annotations or their COG membership. |
Year | DOI | Venue |
---|---|---|
2005 | 10.1007/11548706_43 | RSFDGrC (2) |
Keywords | Field | DocType |
combinatorial optimization problem,candidate ortholog cluster,ortholog cluster,cog membership,optimization problem,fundamental problem,quasi-concave set function optimization,multipartite graph,orthologous gene,curated ortholog cluster,different genomes,clustering problem,graph clustering,objective function,comparative genomics | Genome,Set function,Multipartite,Pattern recognition,Computer science,Comparative genomics,Combinatorial optimization,Artificial intelligence,Cog,Cluster analysis,Optimization problem | Conference |
Volume | ISSN | ISBN |
3642 | 0302-9743 | 3-540-28660-8 |
Citations | PageRank | References |
1 | 0.41 | 7 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Akshay Vashist | 1 | 176 | 12.64 |
casimir a kulikowski | 2 | 616 | 299.37 |
Ilya Muchnik | 3 | 323 | 47.03 |