Title
Scalable De Novo Genome Assembly Using a Pregel-Like Graph-Parallel System
Abstract
AbstractDe novo genome assembly is the process of stitching short DNA sequences to generate longer DNA sequences, without using any reference sequence for alignment. It enables high-throughput genome sequencing and thus accelerates the discovery of new genomes. In this paper, we present a toolkit, called PPA-assembler, for de novo genome assembly in a distributed setting. The operations in our toolkit provide strong performance guarantees, and can be assembled to implement various sequencing strategies. PPA-assembler adopts the popular de Bruijn graph based approach for sequencing, and each operation is implemented as a program in Google’s Pregel framework which can be easily deployed in a generic cluster. Experiments on large real and simulated datasets demonstrate that PPA-assembler is much more efficient than the state-of-the-arts while providing comparable sequencing quality. PPA-assembler has been open-sourced at https://github.com/yaobaiwei/PPA-Assembler.
Year
DOI
Venue
2021
10.1109/TCBB.2019.2920912
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Keywords
DocType
Volume
Genome assembly, graph, distributed, vertex-centric, Pregel, DNA, read, contig, k-mer
Journal
18
Issue
ISSN
Citations 
2
1545-5963
1
PageRank 
References 
Authors
0.35
0
6
Name
Order
Citations
PageRank
Guimu Guo162.48
Hongzhi Chen24713.00
Da Yan338734.45
James Cheng42044101.89
Jake Chen510.69
Zechen Chong651.03