Title
Towards a complete map of the protein space based on a unified sequence and structure analysis of all known proteins.
Abstract
In search for global principles that may explain the organization of the space of all possible proteins, we study all known protein sequences and structures. In this paper we present a global map of the protein space based on our analysis. Our protein space contains all protein sequences in a non-redundant (NR) database, which includes all major sequence databases. Using the PSI-BLAST procedure we defined 4,670 clusters of related sequences in this space. Of these clusters, 1,421 are centered on a sequence of known structure. All 4,670 clusters were then compared using either a structure metric (when 3D structures are known) or a novel sequence profile metric. These scores were used to define a unified and consistent metric between all clusters. Two schemes were employed to organize these clusters in a meta-organization. The first uses a graph theory method and cluster the clusters in an hierarchical organization. This organization extends our ability to predict the structure and function of many proteins beyond what is possible with existing tools for sequence analysis. The second uses a variation on a multidimensional scaling technique to embed the clusters in a low dimensional real space. This last approach resulted in a projection of the protein space onto a 2D plane that provides us with a bird's eye view of the protein space. Based on this map we suggest a list of possible target sequences with unknown structure that are likely to adopt new, unknown folds.
Year
Venue
Keywords
2000
ISMB
protein structure classification,structure analysis,protein map.,protein sequence classification,complete map,protein space,unified sequence,protein sequence,sequence analysis
Field
DocType
Volume
Structure analysis,Graph theory,Cluster (physics),Protein structure database,Multidimensional scaling,Global Map,Computer science,Bioinformatics,Sequence analysis,Hierarchical organization
Conference
8
ISSN
ISBN
Citations 
1553-0833
1-57735-115-0
9
PageRank 
References 
Authors
0.89
15
2
Name
Order
Citations
PageRank
G Yona165145.52
Michael Levitt258799.00