Title
An Integer Linear Programming Approach for Scaffolding Based on Exemplar Breakpoint Distance.
Abstract
Reference-based scaffolding is an important process used in genomic sequencing to order and orient the contigs in a draft genome based on a reference genome. In this study, we utilize the concept of genome rearrangement to formulate this process as an exemplar breakpoint distance (EBD)-based scaffolding problem, whose aim is to scaffold the contigs of two given draft genomes, both containing duplicate genes (or sequence markers) and acting with each other as a reference, such that the EBD between the scaffolded genomes is minimized. The EBD-based scaffolding problem is difficult to solve because it is non-deterministic polynomial-time (NP)-hard. In this work, we design an integer linear programming (ILP)-based algorithm to exactly solve the EBD-based scaffolding problem. Our experimental results on both simulated and biological data sets show that our ILP-based scaffolding algorithm can accurately and efficiently use a reference genome to scaffold the contigs of a draft genome. Moreover, our ILP-based scaffolding algorithm with considering duplicate genes indeed has better accuracy performance than that without considering duplicate genes, suggesting that duplicate genes and their exemplars are helpful for the application of genome rearrangement in the study of the reference-based scaffolding problem. When compared with RaGOO, a current state-of-the-art alignment-based scaffolder, our ILP-based scaffolding algorithm still has better accuracy performance on the biological data sets.
Year
DOI
Venue
2022
10.1089/cmb.2021.0399
Journal of Computational Biology
Keywords
DocType
Volume
algorithm,exemplar breakpoint distance,integer linear programming,scaffolding,sequencing
Journal
29
Issue
ISSN
Citations 
9
1066-5277
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Yi-Kung Shieh100.34
Dao-Yuan Peng200.34
Yu-Han Chen300.34
Tsung-Wei Wu400.34
Chin Lung Lu542334.59