Title
On Subset Seeds for Protein Alignment
Abstract
We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard Blastp seeding method [2], [3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds versus Blastp.
Year
DOI
Venue
2009
10.1109/TCBB.2009.4
IEEE/ACM Trans. Comput. Biology Bioinform.
Keywords
Field
DocType
pattern matching,local alignment,design methodology,cumulant,comparative analysis,selectivity,design method,molecular biophysics,sensitivity,similarity search,rna,databases,dna,bioinformatics,indexing terms,protein sequence,proteins
Protein sequencing,Computer science,Artificial intelligence,Smith–Waterman algorithm,Bioinformatics,Pattern matching,Protein Databases,Machine learning,Nearest neighbor search
Journal
Volume
Issue
ISSN
6
3
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6 (3) : 483-494, 2009
Citations 
PageRank 
References 
9
0.58
29
Authors
7
Name
Order
Citations
PageRank
Mikhail A. Roytberg111454.66
Anna Gambin217720.88
Laurent Noé323013.94
Slawomir Lasota424026.30
Eugenia Furletova5191.84
Ewa Szczurek6496.75
Gregory Kucherov7100374.54