Title
Efficient and scalable indexing techniques for biological sequence data
Abstract
We investigate indexing techniques for sequence data, crucial in a wide variety of applications, where efficient, scalable, and versatile search algorithms are required. Recent research has focused on suffix trees (ST) and suffix arrays (SA) as desirable index representations. Existing solutions for very long sequences however provide either efficient index construction or efficient search, but not both. We propose a new ST representation, STTD64, which has reasonable construction time and storage requirement, and is efficient in search. We have implemented the construction and search algorithms for the proposed technique and conducted numerous experiments to evaluate its performance on various types of real sequence data. Our results show that while the construction time for STTD64 is comparable with current ST based techniques, it outperforms them in search. Compared to ESA, the best known SA technique, STTD64 exhibits slower construction time, but has similar space requirement and comparable search time. Unlike ESA, which is memory based, STTD64 is scalable and can handle very long sequences.
Year
DOI
Venue
2007
10.1007/978-3-540-71233-6_36
BIRD
Keywords
Field
DocType
efficient index construction,scalable indexing technique,biological sequence data,current st,reasonable construction time,search algorithm,efficient search,construction time,versatile search algorithm,long sequence,slower construction time,comparable search time,indexation,biological database
Data mining,Search algorithm,Suffix,Computer science,Search engine indexing,Biological database,Theoretical computer science,Data sequences,Bioinformatics,Compressed suffix array,Scalability
Conference
Volume
ISSN
Citations 
4414
0302-9743
3
PageRank 
References 
Authors
0.41
23
3
Name
Order
Citations
PageRank
Mihail Halachev181.51
Nematollaah Shiri228028.31
Anand Thamildurai340.76