Title
Reducing the Space of Degenerate Patterns in Protein Remote Homology Detection
Abstract
In biology the notion of degenerate pattern plays a central role for describing various phenomena. For example, protein active site patterns, like those contained in the PROSITE database, e.g. [FY]DPC[LIM][ASG]C[ASG], are in general represented by degenerate patterns with character classes. Researchers have developed several approaches over the years to discover degenerate patterns. Although these methods have been exhaustively and successfully tested on genomes and proteins, their outcome often far exceeds the size of the original input, making the output hard to be managed and then interpreted by refined analysis requiring manual inspection. In this article we discuss a characterization of degenerate patterns with character classes, and introduce the concept of pattern priority, for comparing and ranking different patterns without gaps, together with the class of underlying patterns, which permits to filter any set of degenerate patterns into a new set that is linear in the size of the input sequence. We present some preliminary results on the detection of subtle signals in protein sequences with remote homologies. Results show that our approach drastically reduces the number of patterns in output from a tool for protein sequence analysis, while retaining the functional ones.
Year
DOI
Venue
2013
10.1109/DEXA.2013.36
Database and Expert Systems Applications
Keywords
Field
DocType
bioinformatics,data mining,database management systems,genomics,pattern classification,proteins,PROSITE database,character classes,degenerate pattern characterization,degenerate pattern discovery,degenerate pattern space reduction,genomes,pattern priority,pattern ranking,protein active site patterns,protein remote homology detection,protein sequence analysis
Genome,Degenerate energy levels,Data mining,Protein sequence analysis,Ranking,Computer science,Genomics,Homology (biology),PROSITE
Conference
ISSN
ISBN
Citations 
1529-4188
978-0-7695-5070-1
0
PageRank 
References 
Authors
0.34
12
2
Name
Order
Citations
PageRank
Matteo Comin119120.94
Davide Verzotto2633.96