Boosting data filtering on columnar encoding with SIMD - Citegraph

Paper Info

Title
Boosting data filtering on columnar encoding with SIMD

Abstract
In columnar databases, data is generally stored in an encoded format to save storage space and reduce I/O. Popular encoding schemes include dictionary encoding, delta encoding, run-length encoding, and bit-packed encoding. In many open-source columnar data formats, performing queries on encoded data requires the data to be first decoded to memory, which is time-consuming. In this paper, we design several novel SIMD-based algorithms to speed up query execution on encoded data. Our algorithms use SIMD to vectorize the execution and skip unnecessary decoding for higher efficiency, achieving a throughput of filtering up to 18 billion numbers per second with single thread. We build SBoost, a columnar data store utilizing these algorithms to speed up filtering on encoded data, thus improving query efficiency. SBoost is written in Java and invokes the SIMD algorithms using JNI, making it readily available for Java-based query platforms, which are dominant in open-source data analytic systems. SBoost demonstrates great potential in speeding up query efficiency in both disk-based analytic queries and in-memory queries by reducing query time by up to 90% compared to Apache Parquet.

Year	DOI	Venue
2018	10.1145/3211922.3211932	DaMoN
Field	DocType	ISBN
Regular expression,Computer science,Parallel computing,SIMD,Boosting (machine learning),Decoding methods,Java,Delta encoding,Encoding (memory),Speedup	Conference	978-1-4503-5853-8
Citations	PageRank	References
0	0.34	21
Authors
2

Authors (2 rows)

Cited by (0 rows)

References (21 rows)

Name	Order	Citations	PageRank
hao jiang	1	59	17.96
Aaron J. Elmore	2	352	34.03

1