Title
A novel method of protein sequence classification based on oligopeptide frequency analysis and its application to search for functional sites and to domain localization.
Abstract
A new method for distinguishing among protein families based on the analysis of oligopeptide composition of amino acid sequences is presented. It is assumed that any protein family can be characterized by a set of essential oligopeptides (oligopeptide vocabulary). A simple approach to find such a vocabulary is suggested. It is shown that comparison of the vocabularies can distinguish among different families and the latter from random sequences. This comparison can be successfully made with a small set of frequencies of 25 dipeptides (or tripeptides). No preliminary alignment is necessary. It is established that characteristic peptides are located in the regions of functional value, as shown for GTP-binding domains of the translation elongation factors. It is demonstrated that this method is reasonably efficient for localizing functional domains in the amino acid sequences. The average error of prediction does not exceed three or four amino acid residues as shown for several functional domains.
Year
DOI
Venue
1993
10.1093/bioinformatics/9.1.17
Computer Applications in the Biosciences
Keywords
Field
DocType
frequency analysis,protein sequence
Sequence alignment,Protein family,Elongation factor,Protein sequencing,Computer science,Peptide,Oligopeptide,Homology (biology),Bioinformatics,Vocabulary
Journal
Volume
Issue
ISSN
9
1
0266-7061
Citations 
PageRank 
References 
13
1.46
0
Authors
2
Name
Order
Citations
PageRank
Victor V. Solovyev119335.93
Kira S. Makarova2575.84