Abstract | ||
---|---|---|
The string graph for a collection of next-generation reads is a lossless data representation that is fundamental for de novo assemblers based on the overlap-layout-consensus paradigm. In this article, we explore a novel approach to compute the string graph, based on the FM-index and Burrows and Wheeler Transform. We describe a simple algorithm that uses only the FM-index representation of the collection of reads to construct the string graph, without accessing the input reads. Our algorithm has been integrated into the string graph assembler (SGA) as a standalone module to construct the string graph. The new integrated assembler has been assessed on a standard benchmark, showing that fast string graph (FSG) is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads. Moreover, we have studied the effect of coverage rates on the running times. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1089/cmb.2017.0089 | JOURNAL OF COMPUTATIONAL BIOLOGY |
Keywords | Field | DocType |
Burrows and Wheeler Transform,genome assembly,string graph | String searching algorithm,Assemblers,External Data Representation,Commentz-Walter algorithm,Theoretical computer science,String graph,String metric,Mathematics,Sequence assembly,Lossless compression | Journal |
Volume | Issue | ISSN |
24.0 | 10 | 1066-5277 |
Citations | PageRank | References |
2 | 0.38 | 18 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Paola Bonizzoni | 1 | 502 | 52.23 |
Gianluca Della Vedova | 2 | 342 | 36.39 |
Yuri Pirola | 3 | 128 | 15.79 |
Marco Previtali | 4 | 22 | 5.45 |
Raffaella Rizzi | 5 | 130 | 13.58 |