Title
UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB.
Abstract
Motivation: Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. New approaches that overcome the limitations of methods that rely solely upon sequence similarity are attracting increased attention. One of these novel approaches is to use the organization of the structural domains in proteins. Results: We propose a method for the automatic annotation of protein sequences in the UniProt Knowledgebase (UniProtKB) by comparing their domain architectures, classifying proteins based on the similarities and propagating functional annotation. The performance of this method was measured through a cross-validation analysis using the Gene Ontology (GO) annotation of a subset of UniProtKB/Swiss-Prot. The results demonstrate the effectiveness of this approach in detecting functional similarity with an average F-score: 0.85. We applied the method on nearly 55.3 million uncharacterized proteins in UniProtKB/TrEMBL resulted in 44 818 178 GO term predictions for 12 172 114 proteins. 22% of these predictions were for 2 812 016 previously non-annotated protein entries indicating the significance of the value added by this approach. Availability and implementation: The results of the method are available at: ftp://ftp.ebi.ac.uk/pub/contrib/martin/DAAC/.
Year
DOI
Venue
2016
10.1093/bioinformatics/btw114
BIOINFORMATICS
Field
DocType
Volume
Architecture domain,Data mining,UniProt Knowledgebase,Annotation,Gene ontology,Computer science,UniProt,Bioinformatics,Molecular Sequence Annotation
Journal
32
Issue
ISSN
Citations 
15
1367-4803
1
PageRank 
References 
Authors
0.36
19
7
Name
Order
Citations
PageRank
Tunca Dogan1213.00
Alistair MacDougall210.36
Rabie Saidi310.69
Diego Poggioli410.69
Alex Bateman554611054.58
Claire O'Donovan610.36
Maria Jesus Martin72793365.41