Abstract | ||
---|---|---|
Since there is a strong need for computational methods to predict and characterize functional sites for initial anno- tations of protein structures, a new methodology that relies on descriptions of the functional sites based on local prop- erties is proposed in this paper. This new approach is in- dependent of conserved residues and conserved residue ge- ometry and takes advantage of the large number of protein structures available to construct models using a machine learning approach. Particularly, the proposed method per- formed feature extraction, clustering and classification on a protein structure data set, and it was validated on metal- binding sites (Ca2+, Zn2+, Na+,K+, Mg2+, Mn2+, Cu2+, Fe3+, Hg2+, Cl-) present in a non-redundant PDB (a total of 11,959 metal-binding sites in 3,609 proteins). Feature extraction provided a description of critical fea- tures for each metal-binding site, which were consistent with prior knowledge about them. Furthermore, new in- sights about metal-binding site microenvironments could be provided by the descriptors thus obtained. Results using k-fold cross-validation for classification showed accuracy above 90%. Complete proteins were scanned using these classifiers to locate metal-binding sites. Keywords: Functional Genomics, Protein functional sites, Feature Extraction, Clustering, Classification, Metal- binding sites. Java source code available upon request. Supplementary Website: http://dis.unal.edu.co/~biocomp/metals/ |
Year | DOI | Venue |
---|---|---|
2007 | 10.1109/BIBM.2007.17 | BIBM |
Keywords | Field | DocType |
complete protein,binding site,feature extraction,protein structure,predicting protein functional sites,protein functional site,functional site,metal-binding site,new approach,protein structure data,metal-binding site microenvironments,novel methodology,bioinformatics,genomics,sequences,geometry,machine learning,solid modeling,protein engineering,statistics | Binding site,Computer science,Protein engineering,Functional genomics,Feature extraction,Artificial intelligence,Bioinformatics,Cluster analysis,Protein Data Bank (RCSB PDB),Machine learning,Protein structure,Java source code | Conference |
ISSN | ISBN | Citations |
2156-1125 | 0-7695-3031-1 | 0 |
PageRank | References | Authors |
0.34 | 7 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Leonardo Bobadilla | 1 | 9 | 2.01 |
Fernando Niño | 2 | 180 | 9.20 |
Edilberto Cepeda | 3 | 0 | 0.34 |
Manuel A Patarroyo | 4 | 29 | 1.94 |