Lightweight BWT construction for very large string collections - Citegraph

Paper Info

Title
Lightweight BWT construction for very large string collections

Abstract
A modern DNA sequencing machine can generate a billion or more sequence fragments in a matter of days. The many uses of the BWT in compression and indexing are well known, but the computational demands of creating the BWT of datasets this large have prevented its applications from being widely explored in this context. We address this obstacle by presenting two algorithms capable of computing the BWT of very large string collections. The algorithms are lightweight in that the first needs O(m log m) bits of memory to process m strings and the memory requirements of the second are constant with respect to m. We evaluate our algorithms on collections of up to 1 billion strings and compare their performance to other approaches on smaller datasets. Although our tests were on collections of DNA sequences of uniform length, the algorithms themselves apply to any string collection over any alphabet.

Year	DOI	Venue
2011	10.1007/978-3-642-21458-5_20	CPM
Keywords	Field	DocType
billion string,m log m,dna sequence,string collection,smaller datasets,modern dna,computational demand,memory requirement,m string,large string collection,lightweight bwt construction,bwt,next generation sequencing	Computer science,Search engine indexing,Theoretical computer science,DNA sequencing,DNA sequencer,Alphabet	Conference
Volume	ISSN	Citations
6661	0302-9743	25
PageRank	References	Authors
1.51	19	3

Authors (3 rows)

Cited by (25 rows)

References (19 rows)

Name	Order	Citations	PageRank
Markus J. Bauer	1	121	6.13
Anthony J Cox	2	198	13.63
Giovanna Rosone	3	193	21.77

1