Title
AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references.
Abstract
Motivation: De novo assemblies of genomes remain one of the most challenging applications in next-generation sequencing. Usually, their results are incomplete and fragmented into hundreds of contigs. Repeats in genomes and sequencing errors are the main reasons for these complications. With the rapidly growing number of sequenced genomes, it is now feasible to improve assemblies by guiding them with genomes from related species. Results: Here we introduce AlignGraph, an algorithm for extending and joining de novo-assembled contigs or scaffolds guided by closely related reference genomes. It aligns paired-end (PE) reads and pre-assembled contigs or scaffolds to a close reference. From the obtained alignments, it builds a novel data structure, called the PE multipositional de Bruijn graph. The incorporated positional information from the alignments and PE reads allows us to extend the initial assemblies, while avoiding incorrect extensions and early terminations. In our performance tests, AlignGraph was able to substantially improve the contigs and scaffolds from several assemblers. For instance, 28.7-62.3% of the contigs of Arabidopsis thaliana and human could be extended, resulting in improvements of common assembly metrics, such as an increase of the N50 of the extendable contigs by 89.9-94.5% and 80.3-165.8%, respectively. In another test, AlignGraph was able to improve the assembly of a published genome (Arabidopsis strain Landsberg) by increasing the N50 of its extendable scaffolds by 86.6%. These results demonstrate AlignGraph's efficiency in improving genome assemblies by taking advantage of closely related references.
Year
DOI
Venue
2014
10.1093/bioinformatics/btu291
BIOINFORMATICS
Keywords
Field
DocType
genome,sequence alignment,genomics,algorithms
Sequence alignment,Genome,Data structure,Computer science,Algorithm,Genomics,Contig,De Bruijn graph,Bioinformatics,Reference standards,Sequence assembly
Journal
Volume
Issue
ISSN
30
12
1367-4803
Citations 
PageRank 
References 
2
0.37
18
Authors
3
Name
Order
Citations
PageRank
Ergude Bao1346.95
Tao Jiang21809155.32
Thomas Girke31369.39