Title
Fast Short Read De-novo Assembly Using Overlap-Layout-Consensus Approach.
Abstract
The <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">de-novo</italic> genome assembly is a challenging computational problem for which several pipelines have been developed. The advent of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">long-read</italic> sequencing technology has resulted in a new set of algorithmic approaches for the assembly process. In this work, we identify that one of these new and fast <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">long-read</italic> assembly techniques (using <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Minimap2</italic> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Miniasm</italic> ) can be modified for the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">short-read</italic> assembly process. This possibility motivated us to customize a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">long-read</italic> assembly approach for applications in a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">short-read</italic> assembly scenario. Here, we compare and contrast our proposed <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">de-novo</italic> assembly pipeline ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MiniSR</italic> ) with three other recently developed programs for the assembly of bacterial and small eukaryotic genomes. We have documented two trade-offs: one between speed and accuracy and the other between contiguity and base-calling errors. Our proposed assembly pipeline shows a good balance in these trade-offs. The resulting pipeline is 6 and 2.2 times faster than the short-read assemblers Spades and SGA, respectively. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MiniSR</italic> generates assemblies of superior N50 and NGA50 to <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SGA</italic> , although assemblies are less complete and accurate than those from <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Spades</italic> . A third tool, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SOAPdenovo2</italic> , is as fast as our proposed pipeline but had poorer assembly quality.
Year
DOI
Venue
2020
10.1109/TCBB.2018.2875479
IEEE/ACM transactions on computational biology and bioinformatics
Keywords
Field
DocType
Pipelines,Genomics,Bioinformatics,Indexing,Tools,DNA
Genome,Contiguity,Pipeline transport,Computational problem,Computer science,Parallel computing,Artificial intelligence,Sequence assembly,Machine learning
Journal
Volume
Issue
ISSN
17
1
1545-5963
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Arash Bayat102.03
Nandan Deshpande230932.76
Marc R. Wilkins33416.39
Sri Parameswaran41062102.76