Title
Discovering Protein Function Classification Rules From Reduced Alphabet Representations Of Protein Sequences
Abstract
The paper explores the use of reduced alphabet representations of protein sequences in the data-driven discovery of data-driven discovery of sequence motif-based decision trees for classifying protein sequences into functional families. A number of alternative representations of protein sequences (using a variety of reduced alphabets based on groupings of amino acids in terms of their physico-chemical properties were explored in addition to the 20-letter amino acid alphabet. Classifiers were constructed using motifs generated using a multiple sequence alignment based motif discovery tool (MEME). Results of experiments on a data set of eleven protease families show that the classification performance of the resulting decision trees based on several reduced alphabets (e.g., a 7-letter alphabet based on groupings of amino acids based on their mass and charge, a 5-letter alphabet based on a random grouping of the 20 amino acids into 5 groups) is comparable to that of trees based on the 20-letter amino acid alphabet. The results also show that the sequence motifs based on different alphabets capture regularities in different portions of the sequences. This raises the possibility that the use of different alphabets might provide different, but complementary insights into protein structure-function relationships.
Year
Venue
Keywords
2002
PROCEEDINGS OF THE 6TH JOINT CONFERENCE ON INFORMATION SCIENCES
sequence motif,decision tree,amino acid,protein sequence,protein structure,multiple sequence alignment
Field
DocType
Citations 
Sequence alignment,Sequence logo,Pattern recognition,Amino acid,Sequence motif,Artificial intelligence,Multiple sequence alignment,Protein function prediction,Mathematics,Sequence analysis,Multiple EM for Motif Elicitation
Conference
4
PageRank 
References 
Authors
0.71
6
3
Name
Order
Citations
PageRank
Carson M. Andorf1728.86
Drena Dobbs242335.43
Vasant Honavar33353468.10