Title | ||
---|---|---|
AQUa: an adaptive framework for compression of sequencing quality scores with random access functionality. |
Abstract | ||
---|---|---|
Motivation: The past decade has seen the introduction of new technologies that significantly lowered the cost of genome sequencing. As a result, the amount of genomic data that must be stored and transmitted is increasing exponentially. To mitigate storage and transmission issues, we introduce a framework for lossless compression of quality scores. Results: This article proposes AQUa, an adaptive framework for lossless compression of quality scores. To compress these quality scores, AQUa makes use of a configurable set of coding tools, extended with a Context-Adaptive Binary Arithmetic Coding scheme. When benchmarking AQUa against generic single-pass compressors, file sizes are reduced by up to 38.49% when comparing with GNU Gzip and by up to 6.48% when comparing with 7-Zip at the Ultra Setting, while still providing support for random access. When comparing AQUa with the purpose-built, single-pass, and state-of-the-art compressor SCALCE, which does not support random access, file sizes are reduced by up to 21.14%. When comparing AQUa with the purpose-built, dual-pass, and state-of-the-art compressor QVZ, which does not support random access, file sizes are larger by 6.42-33.47%. However, for one test file, the file size is 0.38% smaller, illustrating the strength of our single-pass compression framework. This work has been spurred by the current activity on genomic information representation (MPEG-G) within the ISO/IEC SC29/WG11 technical committee. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1093/bioinformatics/btx607 | BIOINFORMATICS |
Field | DocType | Volume |
Data mining,Lossy compression,Computer science,Coding (social sciences),File size,Software,Bioinformatics,Data compression,Database,Random access,Context-adaptive binary arithmetic coding,Lossless compression | Journal | 34 |
Issue | ISSN | Citations |
3 | 1367-4803 | 1 |
PageRank | References | Authors |
0.37 | 3 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Tom Paridaens | 1 | 49 | 5.91 |
Glenn Van Wallendael | 2 | 137 | 23.28 |
Wesley De Neve | 3 | 525 | 54.41 |
Peter Lambert | 4 | 538 | 67.24 |