Title | ||
---|---|---|
Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction. |
Abstract | ||
---|---|---|
Background Structural variations caused by a wide range of
physicochemical and biological sources directly influence
the function of a protein. For enzymatic proteins, the
structure and chemistry of the catalytic binding site
residues can be loosely defined as a substructure of the
protein. Comparative analysis of drug-receptor
substructures across and within species has been used for
lead evaluation. Substructure-level similarity between the
binding sites of functionally similar proteins has also
been used to identify instances of convergent evolution
among proteins. In functionally homologous protein
families, shared chemistry and geometry at catalytic sites
provide a common, local point of comparison among proteins
that may differ significantly at the sequence, fold, or
domain topology levels.
Results This paper describes two key results that can be
used separately or in combination for protein function
analysis. The Family-wise Analysis of SubStructural
Templates (FASST) method uses all-against-all substructure
comparison to determine Substructural Clusters (SCs). SCs
characterize the binding site substructural variation
within a protein family. In this paper we focus on examples
of automatically determined SCs that can be linked to
phylogenetic distance between family members, segregation
by conformation, and organization by homology among
convergent protein lineages. The Motif Ensemble Statistical
Hypothesis (MESH) framework constructs a representative
motif for each protein cluster among the SCs determined by
FASST to build motif ensembles that are shown through a
series of function prediction experiments to improve the
function prediction power of existing motifs.
Conclusions FASST contributes a critical feedback and
assessment step to existing binding site substructure
identification methods and can be used for the thorough
investigation of structure-function relationships. The
application of MESH allows for an automated, statistically
rigorous procedure for incorporating structural variation
data into protein function prediction pipelines. Our work
provides an unbiased, automated assessment of the
structural variability of identified binding site
substructures among protein structure families and a
technique for exploring the relation of substructural
variation to protein function. As available proteomic data
continues to expand, the techniques proposed will be
indispensable for the large-scale analysis and
interpretation of structural data.
|
Year | DOI | Venue |
---|---|---|
2010 | 10.1186/1471-2105-11-242 | BMC Bioinformatics |
Keywords | Field | DocType |
bioinformatics,protein family,convergent evolution,microarrays,enzymes,protein function prediction,protein conformation,binding site,protein folding,algorithms,structured data,binding sites,proteins,proteomics,protein structure,comparative analysis | Protein family,Structural variation,Protein folding,Proteomics,Biology,Protein superfamily,Bioinformatics,Protein Data Bank,Genetics,Protein function prediction,Protein structure | Journal |
Volume | Issue | ISSN |
11 | 1 | 1471-2105 |
Citations | PageRank | References |
8 | 0.52 | 19 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Drew H. Bryant | 1 | 59 | 3.79 |
Mark Moll | 2 | 885 | 56.55 |
Brian Y. Chen | 3 | 78 | 10.06 |
Viacheslav Y. Fofanov | 4 | 81 | 3.44 |
Lydia E. Kavraki | 5 | 5370 | 470.50 |