Title
Protein function prediction for newly sequenced organisms
Abstract
Recent successes in protein function prediction have shown the superiority of approaches that integrate multiple types of experimental evidence over methods that rely solely on homology. However, newly sequenced organisms continue to represent a difficult challenge, because only their protein sequences are available and they lack data derived from large-scale experiments. Here we introduce S2F (Sequence to Function), a network propagation approach for the functional annotation of newly sequenced organisms. Our main idea is to systematically transfer functionally relevant data from model organisms to newly sequenced ones, thus allowing us to use a label propagation approach. S2F introduces a novel label diffusion algorithm that can account for the presence of overlapping communities of proteins with related functions. As most newly sequenced organisms are bacteria, we tested our approach in the context of bacterial genomes. Our extensive evaluation shows a great improvement over existing sequence-based methods, as well as four state-of-the-art general-purpose protein function prediction methods. Our work demonstrates that employing a diffusion process over networks of transferred functional data is an effective way to improve predictions over simple homology. S2F is applicable to any type of newly sequenced organism as well as to those for which experimental evidence is available. A free, easy to run version of S2F is available at https://www.paccanarolab.org/s2f.
Year
DOI
Venue
2021
10.1038/s42256-021-00419-7
NATURE MACHINE INTELLIGENCE
DocType
Volume
Issue
Journal
3
12
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Mateo Torres100.34
Haixuan Yang283533.22
Alfonso E. Romero310910.68
Alberto Paccanaro420624.14