Title
Finding Protein Domain Boundaries: An Automated, Non-Homology-Based Method
Abstract
A Bayesian algorithm identifies structural domains in proteins using amino acid sequence information only. This approach differs from other sequence-only approaches, which are typically sequence-homology-based, not fully automated, or dependent on the structure being known. This approach catalogs "pattern" frequencies-occurrences of groups of amino acids-in a nonredundant database of known protein domains to identify those that appear to signal the beginnings and ends of domains. It uses those patterns to score new sequences and find their domain boundaries. Inspecting the patterns that appear significant in marking the fronts or backs of domains reveal subtle differences in amino acid use along each domain's length. These patterns might elucidate differences in function between chemically similar amino acids.This article is part of a special issue on data mining in bioinformatics.
Year
DOI
Venue
2005
10.1109/MIS.2005.106
IEEE Intelligent Systems
Keywords
Field
DocType
similar amino acid,approach catalog,bayesian algorithm,amino acids-in,structural domain,amino acid use,non-homology-based method,finding protein domain boundaries,domain boundary,sequence-only approach,known protein domain,amino acid sequence information,statistical analysis,protein sequence,proteins,amino acid,pattern recognition,genetics,bayesian approach,protein domains,structural biology
Data mining,Protein structure database,Protein domain,Protein sequencing,Structural biology,Amino acid,Computer science,Homology (biology),Bioinformatics,Computational biology,Structural Classification of Proteins database,Bayesian probability
Journal
Volume
Issue
ISSN
20
6
1541-1672
Citations 
PageRank 
References 
3
0.43
5
Authors
2
Name
Order
Citations
PageRank
Brian M. Gurbaxani1141.15
Parag Mallick216418.61