Abstract | ||
---|---|---|
AbstractDe novo genome assembly is the process of stitching short DNA sequences to generate longer DNA sequences, without using any reference sequence for alignment. It enables high-throughput genome sequencing and thus accelerates the discovery of new genomes. In this paper, we present a toolkit, called PPA-assembler, for de novo genome assembly in a distributed setting. The operations in our toolkit provide strong performance guarantees, and can be assembled to implement various sequencing strategies. PPA-assembler adopts the popular de Bruijn graph based approach for sequencing, and each operation is implemented as a program in Google’s Pregel framework which can be easily deployed in a generic cluster. Experiments on large real and simulated datasets demonstrate that PPA-assembler is much more efficient than the state-of-the-arts while providing comparable sequencing quality. PPA-assembler has been open-sourced at https://github.com/yaobaiwei/PPA-Assembler. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/TCBB.2019.2920912 | IEEE/ACM Transactions on Computational Biology and Bioinformatics |
Keywords | DocType | Volume |
Genome assembly, graph, distributed, vertex-centric, Pregel, DNA, read, contig, k-mer | Journal | 18 |
Issue | ISSN | Citations |
2 | 1545-5963 | 1 |
PageRank | References | Authors |
0.35 | 0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Guimu Guo | 1 | 6 | 2.48 |
Hongzhi Chen | 2 | 47 | 13.00 |
Da Yan | 3 | 387 | 34.45 |
James Cheng | 4 | 2044 | 101.89 |
Jake Chen | 5 | 1 | 0.69 |
Zechen Chong | 6 | 5 | 1.03 |