Title
Exploring Frequented Regions in Pan-Genomic Graphs
Abstract
We consider the problem of identifying regions within a pan-genome de Bruijn graph that are traversed by many sequence paths. We define such regions and the subpaths that traverse them as frequented regions (FRs). In this work we formalize the FR problem and describe an efficient algorithm for finding FRs. Subsequently, we propose some applications of FRs based on machine-learning and pan-genome graph simplification. We demonstrate the effectiveness of these applications using data sets for the organisms Staphylococcus aureus (bacteria) and Saccharomyces cerevisiae (yeast). We corroborate the biological relevance of FRs such as identifying introgressions in yeast that aid in alcohol tolerance, and show that FRs are useful for classification of yeast strains by industrial use and visualizing pan-genomic space.
Year
DOI
Venue
2017
10.1145/3107411.3107427
BCB
Keywords
Field
DocType
Pan-genomics,classification,visualization
Graph,Computer science,De Bruijn graph,Bioinformatics
Conference
ISBN
Citations 
PageRank 
978-1-4503-4722-8
0
0.34
References 
Authors
12
5
Name
Order
Citations
PageRank
Alan Cleary101.69
Indika Kahanda284.87
Brendan M. Mumey39717.55
Joann Mudge401.01
Thiruvarangan Ramaraj501.35