Lightweight LCP construction for next-generation sequencing datasets - Citegraph

Paper Info

Title
Lightweight LCP construction for next-generation sequencing datasets

Abstract
The advent of "next-generation" DNA sequencing (NGS) technologies has meant that collections of hundreds of millions of DNA sequences are now commonplace in bioinformatics. Knowing the longest common prefix array (LCP) of such a collection would facilitate the rapid computation of maximal exact matches, shortest unique substrings and shortest absent words. CPU-efficient algorithms for computing the LCP of a string have been described in the literature, but require the presence in RAM of large data structures. This prevents such methods from being feasible for NGS datasets. In this paper we propose the first lightweight method that simultaneously computes, via sequential scans, the LCP and BWT of very large collections of sequences. Computational results on collections as large as 800 million 100-mers demonstrate that our algorithm scales to the vast sequence collections encountered in human whole genome sequencing experiments.

Year	DOI	Venue
2012	10.1007/978-3-642-33122-0_26	WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
Keywords	DocType	Volume
computational result,cpu-efficient algorithm,shortest absent word,next-generation sequencing datasets,dna sequencing,shortest unique substrings,dna sequence,lightweight lcp construction,ngs datasets,large data structure,algorithm scale,large collection,bwt	Conference	abs/1305.0160
ISSN	Citations	PageRank
Lecture Notes in Computer Science Volume 7534, 2012, pp 326-337	16	0.71
References	Authors
16	4

Authors (4 rows)

Cited by (16 rows)

References (16 rows)

Name	Order	Citations	PageRank
Markus J. Bauer	1	121	6.13
Anthony J Cox	2	198	13.63
Giovanna Rosone	3	193	21.77
Marinella Sciortino	4	225	22.34

1