Title
Ortholog clustering on a multipartite graph.
Abstract
We present a method for automatically extracting groups of orthologous genes from a large set of genomes through the development of a new clustering method on a weighted multipartite graph. The method assigns a score to an arbitrary subset of genes from multiple genomes to assess the orthologous relationships between genes in the subset. This score is computed using sequence similarities between the member genes and the phylogenetic relationship between the corresponding genomes. An ortholog cluster is found as the subset with highest score, so ortholog clustering is formulated as a combinatorial optimization problem. The algorithm for finding an ortholog cluster runs in time O(|E| + |V| log |V|), where V and E are the sets of vertices and edges, respectively in the graph. However, if we discretize the similarity scores into a constant number of bins, the run time improves to O(|E| + |V|). The proposed method was applied to seven complete eukaryote genomes on which manually curated ortholog clusters, KOG (eukaryotic ortholog clusters, http://www.ncbi.nlm.nih.gov/COG/new/) are constructed. A comparison of our results with the manually curated ortholog clusters shows that our clusters are well correlated with the existing clusters. Finally, we demonstrate how gene order information can be incorporated in the proposed method for improving ortholog detection.
Year
DOI
Venue
2007
10.1109/TCBB.2007.1004
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Keywords
Field
DocType
graph-theoretic methods,arbitrary subset,curated ortholog cluster,multipartite graph,genetics.,biology,ortholog cluster,complete eukaryote genomes,multiple genomes,clustering algorithms,highest score,eukaryotic ortholog cluster,ortholog clustering,corresponding genomes
Genome,Similitude,Discrete mathematics,Combinatorics,Phylogenetic tree,Vertex (geometry),Combinatorial optimization,Multipartite graph,Bioinformatics,Phylogenetics,Cluster analysis,Mathematics
Journal
Volume
Issue
ISSN
4
1
1545-5963
ISBN
Citations 
PageRank 
3-540-29008-7
8
0.62
References 
Authors
17
3
Name
Order
Citations
PageRank
Akshay Vashist117612.64
casimir a kulikowski2616299.37
Ilya Muchnik332347.03