Title | ||
---|---|---|
A Novel Compression Algorithm For High-Throughput Dna Sequence Based On Huffman Coding Method |
Abstract | ||
---|---|---|
NGS (Next generation sequencing) technology can concurrently accomplish sequencing of a large scale of DNA data in one time, resulting in a large number of DNA short reads. The transportation and processing of DNA data are thus faced with difficulties. There are two kinds of compression methods for high-throughput DNA data, reference-based method and reference-free method. Reference-free method is adaptive for compressing DNA data from different species without storing large genome for reference. In this paper, we proposed a reference-free algorithm, named HDC, realizing high-throughput DNA compression based on Huffman coding and dictionary method. The algorithm builds multiple dictionaries through Huffman coding and uses the dictionary to finish the compression and decompression. By testing on the genomes of human, green monkey and horse, HDC's lowest compression rate reaches 0.192 when compressing the human genome with chromosome as compression unit. We also compared HDC with a conventional compression algorithm gzip, and two reference-free DNA compression algorithms Leon and ORCOM. The result demonstrates that the HDC algorithm performs significantly best among the three algorithms. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/CISP-BMEI.2018.8633219 | 2018 11TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2018) |
Keywords | Field | DocType |
High-throughput sequencing, DNA data compression, Huffman coding | Genome,Data compression ratio,Pattern recognition,Computer science,Huffman coding,Artificial intelligence,DNA sequencing,Throughput,Human genome,Data compression | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chuan He | 1 | 32 | 5.43 |
Huaiqiu Zhu | 2 | 162 | 15.27 |