Title
A Fast Algorithm for Constructing Suffix Arrays for Fixed-Size Alphabets
Abstract
The suffix array of a string T is basically a sorted list of all the suffixes of T. Suffix arrays have been fundamental index data structures in computational biology. If we are to search a DNA sequence in a genome sequence, we construct the suffix array for the genome sequence and then search the DNA sequence in the suffix array. In this paper, we consider the construction of the suffix array of T of length n where the size of the alphabet is fixed. It has been well-known that one can construct the suffix array of T in O(n) time by constructing suffix tree of T and traversing the suffix tree. Although this approach takes O(n) time, it is not appropriate for practical use because it uses a lot of spaces and it is complicated to implement. Recently, almost at the same time, several algorithms have been developed to directly construct suffix arrays in O(n) time. However, these algorithms are developed for integer alphabets and thus do not exploit the properties given when the size of the alphabet is fixed. We present a fast algorithm for constructing suffix arrays for the fixed-size alphabet. Our algorithm constructs suffix arrays faster than any other algorithms developed for integer or general alphabets when the size of the alphabet is fixed. For example, we reduced the time required for constructing suffix arrays for DNA sequences by 25%-38%. In addition, we do not sacrifice the space to improve the running time. The space required by our algorithm is almost equal to or even less than those required by previous fast algorithms.
Year
DOI
Venue
2004
10.1007/978-3-540-24838-5_23
Lecture Notes in Computer Science
Keywords
Field
DocType
data structure,dna sequence,indexation,genome sequence,computational biology
Discrete mathematics,Suffix,Computer science,Algorithm,Suffix array,FM-index,Generalized suffix tree,Suffix tree,Compressed suffix array,String (computer science),Longest common substring problem
Conference
Volume
ISSN
Citations 
3059
0302-9743
10
PageRank 
References 
Authors
0.64
18
3
Name
Order
Citations
PageRank
Dong Kyue Kim122622.59
Junha Jo2100.64
Heejin Park323521.63