Reference based genome compression - Citegraph

Paper Info

Title
Reference based genome compression

Abstract
DNA sequencing technology has advanced to a point where storage is becoming the central bottleneck in the acquisition and mining of more data. Large amounts of data are vital for genomics research, and generic compression tools, while viable, cannot offer the same savings as approaches tuned to inherent biological properties. We propose an algorithm to compress a target genome given a known reference genome. The proposed algorithm first generates a mapping from the reference to the target genome, and then compresses this mapping with an entropy coder. As an illustration of the performance: applying our algorithm to James Watson's genome with hg18 as a reference, we are able to reduce the 2991 megabyte (MB) genome down to 6.99 MB, while Gzip compresses it to 834.8 MB.

Year	DOI	Venue
2012	10.1109/ITW.2012.6404708	Information Theory Workshop
Keywords	Field	DocType
biology computing,data acquisition,data compression,data mining,genomics,molecular biophysics,DNA sequencing technology,James Watson genome,data acquisition,data mining,entropy coder,generic compression tool,genomics research,reference based genome compression,target genome	Genome,Bottleneck,Data mining,Computer science,Megabyte,Data acquisition,Genomics,DNA sequencing,Data compression,Reference genome	Conference
Volume	ISBN	Citations
abs/1204.1912	978-1-4673-0222-7	7
PageRank	References	Authors
0.54	9	4

Authors (4 rows)

Cited by (7 rows)

References (9 rows)

Name	Order	Citations	PageRank
B.G. Chern	1	7	0.54
I. Ochoa	2	7	0.54
A. Manolakos	3	7	0.54
A. No	4	7	0.54

1