Title
Boosting data filtering on columnar encoding with SIMD
Abstract
In columnar databases, data is generally stored in an encoded format to save storage space and reduce I/O. Popular encoding schemes include dictionary encoding, delta encoding, run-length encoding, and bit-packed encoding. In many open-source columnar data formats, performing queries on encoded data requires the data to be first decoded to memory, which is time-consuming. In this paper, we design several novel SIMD-based algorithms to speed up query execution on encoded data. Our algorithms use SIMD to vectorize the execution and skip unnecessary decoding for higher efficiency, achieving a throughput of filtering up to 18 billion numbers per second with single thread. We build SBoost, a columnar data store utilizing these algorithms to speed up filtering on encoded data, thus improving query efficiency. SBoost is written in Java and invokes the SIMD algorithms using JNI, making it readily available for Java-based query platforms, which are dominant in open-source data analytic systems. SBoost demonstrates great potential in speeding up query efficiency in both disk-based analytic queries and in-memory queries by reducing query time by up to 90% compared to Apache Parquet.
Year
DOI
Venue
2018
10.1145/3211922.3211932
DaMoN
Field
DocType
ISBN
Regular expression,Computer science,Parallel computing,SIMD,Boosting (machine learning),Decoding methods,Java,Delta encoding,Encoding (memory),Speedup
Conference
978-1-4503-5853-8
Citations 
PageRank 
References 
0
0.34
21
Authors
2
Name
Order
Citations
PageRank
hao jiang15917.96
Aaron J. Elmore235234.03