A New Lightweight Algorithm to compute the BWT and the LCP array of a Set of Strings. - Citegraph

Paper Info

Title
A New Lightweight Algorithm to compute the BWT and the LCP array of a Set of Strings.

Abstract
Indexing of very large collections of strings such as those produced by the widespread sequencing technologies, heavily relies on multi-string generalizations of the Burrows-Wheeler Transform (BWT), and for this problem various in-memory algorithms have been proposed. The rapid growing of data that are processed routinely, such as in bioinformatics, requires a large amount of main memory, and this fact has motivated the development of algorithms, to compute the BWT, that work almost entirely in external memory. On the other hand, the related problem of computing the Longest Common Prefix (LCP) array is often instrumental in several algorithms on collection of strings, such as those that compute the suffix-prefix overlap among strings, which is an essential step for many genome assembly algorithms. The best current lightweight approach to compute BWT and LCP array on a set of $m$ strings, each one $k$ characters long, has I/O complexity that is $O(mk^2 log \|Sigma\|)$ (where $\|Sigma\|$ is the size of the alphabet), thus it is not optimal. In this paper we propose a novel approach to build BWT and LCP array (simultaneously) with $O(kmL(log k +log sigma))$ I/O complexity, where $L$ is the length of longest substring that appears at least twice in the input strings.

Year	Venue	Field
2016	arXiv: Data Structures and Algorithms	LCP array,Discrete mathematics,Substring,Combinatorics,Generalization,Search engine indexing,Algorithm,Sigma,Mathematics,Auxiliary memory,Alphabet
DocType	Volume	Citations
Journal	abs/1607.08342	0
PageRank	References	Authors
0.34	13	5

Authors (5 rows)

Cited by (0 rows)

References (13 rows)

Name	Order	Citations	PageRank
Paola Bonizzoni	1	502	52.23
Gianluca Della Vedova	2	342	36.39
Serena Nicosia	3	0	0.34
Marco Previtali	4	22	5.45
Raffaella Rizzi	5	130	13.58

1