Title
Rainbowfish: A Succinct Colored de Bruijn Graph Representation.
Abstract
The colored de Bruijn graph— a variant of the de Bruijn graph which associates each edge (i.e., k-mer) with some set of colors - is an increasingly important combinatorial structure in computational biology. Iqbal et al. demonstrated the utility of this structure for representing and assembling a collection (population) of genomes, and showed how it can be used to accurately detect genetic variants. Muggli et al. introduced VARI, a representation of the colored de Bruijn graph that adopts the BOSS representation for the de Bruijn graph topology and achieves considerable savings in space over Cortex, albeit with some sacrifice in speed. The memory-efficient representation of VARI allows the colored de Bruijn graph to be constructed and analyzed for large datasets, beyond what is possible with Cortex.In this paper, we introduce Rainbowfish, a succinct representation of the color information of the colored de Bruijn graph that reduces the space usage even further. Our representation also uses BOSS to represent the de Bruijn graph, but decomposes the color sets based on an equivalence relation and exploits the inherent skewness in the distribution of these color sets. The Rainbowfish representation is compressed based on the 0th-order entropy of the color sets, which can lead to a significant reduction in the space required to store the relevant information for each edge. In practice, Rainbowfish achieves up to a 20x improvement in space over VARI. Rainbowfish is written in C++11 and is available at https://github.com/COMBINE-lab/rainbowfish.
Year
DOI
Venue
2017
10.4230/LIPIcs.WABI.2017.18
workshop on algorithms in bioinformatics
Keywords
Field
DocType
de Bruijn graph,Succinct data structures,Rank and Select operation
Population,Colored,Equivalence relation,Combinatorics,Line graph,BEST theorem,Biology,Algorithm,De Bruijn graph,De Bruijn sequence,Genetics,De Bruijn index
Conference
Volume
Citations 
PageRank 
88
2
0.39
References 
Authors
0
3
Name
Order
Citations
PageRank
Fatemeh Almodaresi172.56
Prashant Pandey2546.17
Rob Patro311112.98