Abstract | ||
---|---|---|
Indexing very large collections of strings, such as those produced by the widespread next generation sequencing technologies, heavily relies on multi-string generalization of the Burrows-Wheeler Transform (BWT): large requirements of in-memory approaches have stimulated recent developments on external memory algorithms. The related problem of computing the Longest Common Prefix (LCP) array of a set of strings is instrumental to compute the suffix-prefix overlaps among strings, which is an essential step for many genome assembly algorithms. In a previous paper, we presented an in-memory divide- and-conquer method for building the BWT and LCP where we merge partial BWTs with a forward approach to sort suffixes.In this paper, we propose an alternative backward strategy to develop an external memory method to simultaneously build the BWT and the LCP array on a collection of mstrings of different lengths. The algorithm over a set of strings having constant length khas O(mkl) time and I/O volume, using O(k + m) main memory, where lis the maximum value in the LCP array. (C) 2020 Elsevier B.V. All rights reserved. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1016/j.tcs.2020.11.041 | THEORETICAL COMPUTER SCIENCE |
Keywords | DocType | Volume |
Burrows-Wheeler transform, Longest common prefix array, External-memory algorithms | Journal | 862 |
ISSN | Citations | PageRank |
0304-3975 | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Paola Bonizzoni | 1 | 502 | 52.23 |
Gianluca Della Vedova | 2 | 342 | 36.39 |
Yuri Pirola | 3 | 128 | 15.79 |
Marco Previtali | 4 | 0 | 0.34 |
Raffaella Rizzi | 5 | 130 | 13.58 |