Title
Mining frequent patterns in protein structures: a study of protease families.
Abstract
Analysis of protein sequence and structure databases usually reveal frequent patterns (FP) associated with biological function. Data mining techniques generally consider the physicochemical and structural properties of amino acids and their microenvironment in the folded structures. Dynamics is not usually considered, although proteins are not static, and their function relates to conformational mobility in many cases.This work describes a novel unsupervised learning approach to discover FPs in the protein families, based on biochemical, geometric and dynamic features. Without any prior knowledge of functional motifs, the method discovers the FPs for each type of amino acid and identifies the conserved residues in three protease subfamilies; chymotrypsin and subtilisin subfamilies of serine proteases and papain subfamily of cysteine proteases. The catalytic triad residues are distinguished by their strong spatial coupling (high interconnectivity) to other conserved residues. Although the spatial arrangements of the catalytic residues in the two subfamilies of serine proteases are similar, their FPs are found to be quite different. The present approach appears to be a promising tool for detecting functional patterns in rapidly growing structure databases and providing insights in to the relationship among protein structure, dynamics and function.Available upon request from the authors.
Year
DOI
Venue
2004
10.1093/bioinformatics/bth912
ISMB/ECCB (Supplement of Bioinformatics)
Keywords
Field
DocType
protein family,catalytic triad residue,structure databases,serine protease,biological function,conserved residue,frequent pattern,protein structure,protease family,amino acid,protein sequence,catalytic residue,structured data,unsupervised learning,structural dynamics
Protein family,Subtilisin,Protein sequencing,Biology,Proteases,Protease,Bioinformatics,Catalytic triad,Protein structure,Peptide sequence
Conference
Volume
Issue
ISSN
20 Suppl 1
1
1367-4811
Citations 
PageRank 
References 
12
1.18
9
Authors
2
Name
Order
Citations
PageRank
Shann-Ching Chen11138.51
Ivet Bahar236139.41