Title
A comparison of sequence and structure protein domain families as a basis for structural genomics.
Abstract
Motivation: Protein families can be defined based on structure or sequence similarity. We wanted to compare two protein family databases one based on structural and one on sequence similarity, to investigate to what extent they overlap, the similarity in definition of corresponding families, and to create a list of large protein families with unknown structure as a resource for structural genomics. We also wanted to increase the sensitivity of fold assignment by exploiting protein family HMMs. Results: We compared Pfam, a protein family database based on sequence similarity to Scop, which is based on structural similarity We found that 70% of the Scop families exist in Pfam while 57% of the Pfam families exist in Scop. Most families that occur in both databases correspond well to each other but in some cases they are different. Such cases highlight situations in which structure and sequence approaches differ significantly. The comparison enabled us to compile a list of the largest families that do not occur in Scop; these are suitable targets for structure prediction and determination, and may be useful to guide projects in structural genomics. It can be noted that 13 out of the 20 largest protein families without a known structure are likely transmembrane proteins. We also exploited Pfam to increase the sensitivity of detecting homologs of proteins with known structure, by comparing query sequences to Pfam HMMs that correspond to Scop families. For SWISSPROT+TREMBL, this yielded an increase in fold assignment from 31% to 42% compared to using FASTA only. This method assigned a structure to 22% of the proteins in Saccharomyces cerevisiae, 24% in Escherichia coli, and 16% in Methanococcus jannaschii.
Year
DOI
Venue
1999
10.1093/bioinformatics/15.6.480
BIOINFORMATICS
Keywords
DocType
Volume
structural genomics,protein domains
Journal
15
Issue
ISSN
Citations 
6
1367-4803
18
PageRank 
References 
Authors
1.83
1
2
Name
Order
Citations
PageRank
Arne Elofsson163356.98
Erik L L Sonnhammer23962796.29