Abstract | ||
---|---|---|
A succinct text index uses space proportional to the text itself, say, two times n log*** for a text of n characters over an alphabet of size *** . In the past few years, there were several exciting results leading to succinct indexes that support efficient pattern matching. In this paper we present the first succinct index for a text that contains wildcards. The space complexity of our index is (3 + o (1))n log*** + O (***logn ) bits, where *** is the number of wildcard groups in the text. Such an index finds applications in indexing genomic sequences that contain single-nucleotide polymorphisms (SNP), which could be modeled as wildcards. In the course of deriving the above result, we also obtain an alternate succinct index of a set of d patterns for the purpose of dictionary matching. When compared with the succinct index in the literature, the new index doubles the size (precisely, from n log*** to 2 n log*** , where n is the total length of all patterns), yet it reduces the matching time to O (m log*** + m logd + occ ), where m is the length of the query text. It is worth-mentioning that the time complexity no longer depends on the total dictionary size. |
Year | DOI | Venue |
---|---|---|
2009 | 10.1007/978-3-642-03784-9_5 | SPIRE |
Keywords | Field | DocType |
new index,succinct index,m log,n character,n log,dictionary matching,succinct text index,succinct text indexing,times n log,query text,alternate succinct index,genome sequence,indexation,time complexity,single nucleotide polymorphism,pattern matching,space complexity | Wildcard,Combinatorics,Wildcard character,Code segment,Search engine indexing,Time complexity,Pattern matching,Mathematics,Alphabet | Conference |
Volume | ISSN | Citations |
5721 | 0302-9743 | 16 |
PageRank | References | Authors |
0.68 | 15 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Alan Tam | 1 | 60 | 2.71 |
Edward Wu | 2 | 220 | 8.81 |
Tak-Wah Lam | 3 | 1860 | 164.96 |
Siu-ming Yiu | 4 | 1026 | 92.90 |