Title
PgRC: pseudogenome-based read compressor.
Abstract
Motivation: The amount of sequencing data from high-throughput sequencing technologies grows at a pace exceeding the one predicted by Moore's law. One of the basic requirements is to efficiently store and transmit such huge collections of data. Despite significant interest in designing FASTQ compressors, they are still imperfect in terms of compression ratio or decompression resources. Results: We present Pseudogenome-based Read Compressor (PgRC), an in-memory algorithm for compressing the DNA stream, based on the idea of building an approximation of the shortest common superstring over high-quality reads. Experiments show that PgRC wins in compression ratio over its main competitors, SPRING and Minicom, by up to 15 and 20% on average, respectively, while being comparably fast in decompression.
Year
DOI
Venue
2020
10.1093/bioinformatics/btz919
BIOINFORMATICS
DocType
Volume
Issue
Journal
36
7
ISSN
Citations 
PageRank 
1367-4803
0
0.34
References 
Authors
0
2
Name
Order
Citations
PageRank
Tomasz Kowalski112424.06
Szymon Grabowski238536.12