Title
FSG: Fast String Graph Construction for De Novo Assembly.
Abstract
The string graph for a collection of next-generation reads is a lossless data representation that is fundamental for de novo assemblers based on the overlap-layout-consensus paradigm. In this article, we explore a novel approach to compute the string graph, based on the FM-index and Burrows and Wheeler Transform. We describe a simple algorithm that uses only the FM-index representation of the collection of reads to construct the string graph, without accessing the input reads. Our algorithm has been integrated into the string graph assembler (SGA) as a standalone module to construct the string graph. The new integrated assembler has been assessed on a standard benchmark, showing that fast string graph (FSG) is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads. Moreover, we have studied the effect of coverage rates on the running times.
Year
DOI
Venue
2017
10.1089/cmb.2017.0089
JOURNAL OF COMPUTATIONAL BIOLOGY
Keywords
Field
DocType
Burrows and Wheeler Transform,genome assembly,string graph
String searching algorithm,Assemblers,External Data Representation,Commentz-Walter algorithm,Theoretical computer science,String graph,String metric,Mathematics,Sequence assembly,Lossless compression
Journal
Volume
Issue
ISSN
24.0
10
1066-5277
Citations 
PageRank 
References 
2
0.38
18
Authors
5
Name
Order
Citations
PageRank
Paola Bonizzoni150252.23
Gianluca Della Vedova234236.39
Yuri Pirola312815.79
Marco Previtali4225.45
Raffaella Rizzi513013.58