Title
Ranbow: A fast and accurate method for polyploid haplotype reconstruction.
Abstract
Reconstructing haplotypes from sequencing data is one of the major challenges in genetics. Haplotypes play a crucial role in many analyses, including genome-wide association studies and population genetics. Haplotype reconstruction becomes more difficult for higher numbers of homologous chromosomes, as it is often the case for polyploid plants. This complexity is compounded further by higher heterozygosity, which denotes the frequent presence of variants between haplotypes. We have designed Ranbow, a new tool for haplotype reconstruction of polyploid genome from short read sequencing data. Ranbow integrates all types of small variants in bi- and multi-allelic sites to reconstruct haplotypes. To evaluate Ranbow and currently available competing methods on real data, we have created and released a real gold standard dataset from sweet potato sequencing data. Our evaluations on real and simulated data clearly show Ranbow's superior performance in terms of accuracy, haplotype length, memory usage, and running time. Specifically, Ranbow is one order of magnitude faster than the next best method. The efficiency and accuracy of Ranbow makes whole genome haplotype reconstruction of complex genome with higher ploidy feasible. Author summary We focus on the problem of reconstructing haplotypes for polyploid genomes. Our approach explored using short read sequence data from a highly heterozygous hexaploid genome. We observed that short read data from strongly heterozygous organisms open up a way for haplotype reconstruction by supplying overlap information between reads. We therefore investigated the role of heterozygosity and ploidy number. Though higher heterozygosity provides more useful reads for reconstructing haplotypes, polyploidy increases the challenge in assembling reads into longer sequences. We called this the problem of "Ambiguity of Merging" fragments. We addressed this problem by designing a new algorithm called Ranbow. Ranbow was evaluated on real and simulated data from the genomes of tetraploid Capsella bursa-pastoris (Shepherd's Purse) and hexaploid Ipomoea batatas (sweet potato). We were able to show that our method achieved high accuracy and long assembled haplotypes in a feasible amount of time, performing at a level consistently superior to other algorithms.
Year
DOI
Venue
2020
10.1371/journal.pcbi.1007843
PLOS COMPUTATIONAL BIOLOGY
DocType
Volume
Issue
Journal
16
5
ISSN
Citations 
PageRank 
1553-734X
0
0.34
References 
Authors
0
8
Name
Order
Citations
PageRank
M-Hossein Moeinzadeh100.34
Jun Yang2155.13
Evgeny Muzychenko300.34
Giuseppe Gallone400.34
David Heller511.06
Knut Reinert61020105.87
Stefan A Haas7474.48
Martin Vingron81754298.16