Abstract | ||
---|---|---|
Arrays of integers are often compressed in search engines. Though there are many ways to compress integers, we are interested in the popular byte-oriented integer compression techniques (e.g., VByte or Google's varint-GB). Although not known for their speed, they are appealing due to their simplicity and engineering convenience. Amazon's varint-G8IU is one of the fastest byte-oriented compression technique published so far. It makes judicious use of the powerful single-instruction-multiple-data (SIMD) instructions available in commodity processors. To surpass varint-G8IU, we present Stream VByte, a novel byte-oriented compression technique that separates the control stream from the encoded data. Like varint-G8IU, Stream VByte is well suited for SIMD instructions. We show that Stream VByte decoding can be up to twice as fast as varint-G8IU decoding over real data sets. In this sense, Stream VByte establishes new speed records for byte-oriented integer compression, at times exceeding the speed of the memcpy function. On a 3.4 GHz Haswell processor, it decodes more than 4 billion differentially-coded integers per second from RAM to L1 cache. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1016/j.ipl.2017.09.011 | Information Processing Letters |
Keywords | DocType | Volume |
Data compression,Indexing,Vectorization,SIMD instructions,Algorithms | Journal | 130 |
ISSN | Citations | PageRank |
0020-0190 | 5 | 0.46 |
References | Authors | |
4 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Daniel Lemire | 1 | 821 | 52.14 |
Nathan Kurz | 2 | 31 | 2.45 |
Christoph Rupp | 3 | 5 | 0.80 |