Abstract | ||
---|---|---|
The
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">de-novo</italic>
genome assembly is a challenging computational problem for which several pipelines have been developed. The advent of
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">long-read</italic>
sequencing technology has resulted in a new set of algorithmic approaches for the assembly process. In this work, we identify that one of these new and fast
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">long-read</italic>
assembly techniques (using
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Minimap2</italic>
and
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Miniasm</italic>
) can be modified for the
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">short-read</italic>
assembly process. This possibility motivated us to customize a
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">long-read</italic>
assembly approach for applications in a
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">short-read</italic>
assembly scenario. Here, we compare and contrast our proposed
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">de-novo</italic>
assembly pipeline (
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MiniSR</italic>
) with three other recently developed programs for the assembly of bacterial and small eukaryotic genomes. We have documented two trade-offs: one between speed and accuracy and the other between contiguity and base-calling errors. Our proposed assembly pipeline shows a good balance in these trade-offs. The resulting pipeline is 6 and 2.2 times faster than the short-read assemblers Spades and SGA, respectively.
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MiniSR</italic>
generates assemblies of superior N50 and NGA50 to
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SGA</italic>
, although assemblies are less complete and accurate than those from
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Spades</italic>
. A third tool,
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SOAPdenovo2</italic>
, is as fast as our proposed pipeline but had poorer assembly quality. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/TCBB.2018.2875479 | IEEE/ACM transactions on computational biology and bioinformatics |
Keywords | Field | DocType |
Pipelines,Genomics,Bioinformatics,Indexing,Tools,DNA | Genome,Contiguity,Pipeline transport,Computational problem,Computer science,Parallel computing,Artificial intelligence,Sequence assembly,Machine learning | Journal |
Volume | Issue | ISSN |
17 | 1 | 1545-5963 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Arash Bayat | 1 | 0 | 2.03 |
Nandan Deshpande | 2 | 309 | 32.76 |
Marc R. Wilkins | 3 | 34 | 16.39 |
Sri Parameswaran | 4 | 1062 | 102.76 |