Title
Haplotype-aware variant selection for genome graphs
Abstract
BSTRACTGraph-based genome representations have proven to be a powerful tool in genomic analysis due to their ability to encode variations found in multiple haplotypes and capture population genetic diversity. Such graphs also unavoidably contain paths which switch between haplotypes (i.e., recombinant paths) and thus do not fully match any of the constituent haplotypes. The number of such recombinant paths increases combinatorially with path length and cause inefficiencies and false positives when mapping reads. In this paper, we study the problem of finding reduced haplotype-aware genome graphs that incorporate only a selected subset of variants, yet contain paths corresponding to all α-long substrings of the input haplotypes (i.e., non-recombinant paths) with at most δ mismatches. Solving this problem optimally, i.e., minimizing the number of variants selected, is previously known to be NP-hard [14]. Here, we first establish several inapproximability results regarding finding haplotype-aware reduced variation graphs of optimal size. We then present an integer linear programming (ILP) formulation for solving the problem, and experimentally demonstrate this is a computationally feasible approach for real-world problems and provides far superior reduction compared to prior approaches.
Year
DOI
Venue
2022
10.1145/3535508.3545556
Bioinformatics, Computational Biology and Biomedicine
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Neda Tavakoli111.02
Daniel Gibney200.34
Aluru, Srinivas31166122.83