Title
Designing seeds for similarity search in genomic DNA
Abstract
Large-scale comparison of genomic DNA is of fundamental importance in annotating functional elements of genomes. To perform large comparisons efficiently, BLAST (Methods: Companion Methods Enzymol 266 (1996) 460, J. Mol. Biol. 215 (1990) 403, Nucleic Acids Res. 25(17) (1997) 3389) and other widely used tools use seeded alignment, which compares only sequences that can be shown to share a common pattern or ''seed'' of matching bases. The literature suggests that the choice of seed substantially affects the sensitivity of seeded alignment, but designing and evaluating seeds is computationally challenging. This work addresses the problem of designing a seed to optimize performance of seeded alignment. We give a fast, simple algorithm based on finite automata for evaluating the sensitivity of a seed in a Markov model of ungapped alignments, along with extensions to mixtures and inhomogeneous Markov models. We give intuition and theoretical results on which seeds are good choices. Finally, we describe Mandala, a software tool for seed design, and show that it can be used to improve the sensitivity of alignment in practice.
Year
DOI
Venue
2005
10.1016/j.jcss.2004.12.003
J. Comput. Syst. Sci.
Keywords
Field
DocType
finite automaton,j. mol,mandala,functional element,biosequence comparison,common pattern,string matching,nucleic acids res,markov model,seed design,ungapped alignment,inhomogeneous markov model,companion methods enzymol,seeded alignment,genomic dna,similarity search
Software tool,String searching algorithm,Discrete mathematics,Computer science,Markov model,Algorithm,Intuition,Theoretical computer science,Finite-state machine,SIMPLE algorithm,genomic DNA,Nearest neighbor search
Journal
Volume
Issue
ISSN
70
3
Journal of Computer and System Sciences
ISBN
Citations 
PageRank 
1-58113-635-8
74
4.91
References 
Authors
23
3
Name
Order
Citations
PageRank
Jeremy Buhler182193.45
Uri Keich233623.24
Yanni Sun321921.16