Title
Protein Molecular Function Prediction By Bayesian Phylogenomics
Abstract
We present a statistical graphical model to infer specific molecular function for unannotated protein sequences using homology. Based on phylogenomic principles, SIFTER (Statistical Inference of Function Through Evolutionary Relationships) accurately predicts molecular function for members of a protein family given a reconciled phylogeny and available function annotations, even when the data are sparse or noisy. Our method produced specific and consistent molecular function predictions across 100 Pfam families in comparison to the Gene Ontology annotation database, BLAST, GOtcha, and Orthostrapper. We performed a more detailed exploration of functional predictions on the adenosine-5'-monophosphate/adenosine deaminase family and the lactate/malate dehydrogenase family, in the former case comparing the predictions against a gold standard set of published functional characterizations. Given function annotations for 3% of the proteins in the deaminase family, SIFTER achieves 96% accuracy in predicting molecular function for experimentally characterized proteins as reported in the literature. The accuracy of SIFTER on this dataset is a significant improvement over other currently available methods such as BLAST (75%), GeneQuiz (64%), GOtcha (89%), and Orthostrapper (11%). We also experimentally characterized the adenosine deaminase from Plasmodium falciparum, confirming SIFTER's prediction. The results illustrate the predictive power of exploiting a statistical model of function evolution in phylogenomic problems. A software implementation of SIFTER is available from the authors.
Year
DOI
Venue
2005
10.1371/journal.pcbi.0010045
PLOS COMPUTATIONAL BIOLOGY
Keywords
Field
DocType
computer graphics,statistical graphics,proteins,statistical model,computational biology,protein sequence,phylogeny,gold standard,algorithms,proteomics,adenosine deaminase,malate dehydrogenase,protein family,programming languages,statistical inference,sequence alignment,genomics
Sequence alignment,Protein family,Protein domain,Biology,Molecular evolution,Genomics,Statistical model,Statistical inference,Phylogenomics,Bioinformatics,Genetics
Journal
Volume
Issue
ISSN
1
5
1553-734X
Citations 
PageRank 
References 
42
3.10
22
Authors
4
Name
Order
Citations
PageRank
Barbara Engelhardt115119.77
Michael I. Jordan2312203640.80
Kathryn E. Muratore3423.10
Steven E Brenner41679308.17