Title
An experimental study of compressed indexing and local alignments of DNA
Abstract
Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing long DNA sequences such as the human genome (about 3 billion characters) in the main memory [5,13,16]. However, these indexes are designed for exact pattern matching, which is too stringent for most biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). In this paper, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments with any pattern P. Experiments reveal that BWT-SW is very efficient (e.g., aligning a pattern of length 3,000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically, using a simpler model (with gaps disallowed) and random strings. We find that the expected running time is O(|T|0.628|P|). As far as we know, BWT-SW is the first practical tool that can find all local alignments.
Year
Venue
Keywords
2007
COCOA
pattern p. experiments,human genome,billion character,experimental study,biological application,dynamic programming,long dna,bwt index,local alignment,exact pattern matching,main memory,pattern matching,dna sequence,indexation
Field
DocType
Volume
Discrete mathematics,Dynamic programming,Substring,Computer science,Algorithm,Search engine indexing,Theoretical computer science,Software,Human genome,Pattern matching,Speedup
Conference
4616
ISSN
ISBN
Citations 
0302-9743
3-540-73555-0
0
PageRank 
References 
Authors
0.34
14
5
Name
Order
Citations
PageRank
Tak-Wah Lam11860164.96
Wing-Kin Sung21379128.24
Siu-Lung Tam312719.19
Chi-kwong Wong4203.87
Siu-Ming Yiu519719.88