Title
Group-Scheme: SIMD-based compression algorithms for web text data
Abstract
Compression algorithms have been quite important for data oriented tasks, especially in the era of Big Data. The rapid development of modern processors facilitates us with powerful SIMD instruction sets, which provides an opportunity for better performance. Although SIMD based optimization on compression have been explored in some studies [2, 7], these studies usually focus on modifying the existing algorithms to fit into the SIMD instruction. In this paper, we propose a compression framework with a novel storage layout format, which aims to improve instruction-level parallelizability of compression algorithms. By instantiating the framework, we design a novel compression algorithm family, called Group-Scheme, and present a parallelized version of Group-Scheme, called SIMD-Group-Scheme. We evaluate the proposed algorithms on two public TREC data sets. With very competitive performance on compression ratio and encoding speed, SIMD-Group-Scheme significantly outperforms the implementation without SIMD instructions and state-of-the-art algorithm (i.e. SIMD-G8IU [7]), w.r.t decoding speed.
Year
DOI
Venue
2013
10.1109/BigData.2013.6691617
BigData Conference
Keywords
Field
DocType
parallel processing,compression ratio,simd,inverted index,data compression,indexing,encoding speed,integer encoding,public trec data sets,instruction-level parallelizability,index compression,simd-group-scheme,simd-based compression algorithms,simd instruction sets,text analysis,web text data,storage layout format
Inverted index,Computer science,Instruction set,Parallel computing,Search engine indexing,SIMD,Compression ratio,Move-to-front transform,Data compression,Lossless compression
Conference
Volume
Issue
ISSN
null
null
2639-1589
Citations 
PageRank 
References 
1
0.36
6
Authors
4
Name
Order
Citations
PageRank
Xudong Zhang169563.82
Wayne Xin Zhao2127566.73
Dongdong Shan31286.11
Hongfei Yan476335.67