Title
A speculative parallel decompression algorithm on Apache Spark.
Abstract
Data decompression is one of the most important techniques in data processing and has been widely used in multimedia information transmission and processing. However, the existing decompression algorithms on multicore platforms are time-consuming and do not support large data well. In order to expand parallelism and enhance decompression efficiency on large-scale datasets, based on the software thread-level speculation technique, this paper raises a speculative parallel decompression algorithm on Apache Spark. By analyzing the data structure of the compressed data, the algorithm firstly hires a function to divide compressed data into blocks which can be decompressed independently and then spawns a number of threads to speculatively decompress data blocks in parallel. At last, the speculative results are merged to form the final outcome. Comparing with the conventional parallel approach on multicore platform, the proposed algorithm is very efficiency and obtains a high parallelism degree by making the best of the resources of the cluster. Experiments show that the proposed approach could achieve 2.6 speedup when comparing with the traditional approach in average. In addition, with the growing number of working nodes, the execution time cost decreases gradually, and the speedup scales linearly. The results indicate that the decompression efficiency can be significantly enhanced by adopting this speculative parallel algorithm.
Year
DOI
Venue
2017
https://doi.org/10.1007/s11227-017-2000-3
The Journal of Supercomputing
Keywords
Field
DocType
Parallelization,Software thread-level speculation,Speculative multithreading,Decompression,Apache Spark
Data structure,Data processing,Spark (mathematics),Parallel algorithm,Computer science,Parallel computing,Speculative multithreading,Algorithm,Thread (computing),Multi-core processor,Distributed computing,Speedup
Journal
Volume
Issue
ISSN
73
9
0920-8542
Citations 
PageRank 
References 
0
0.34
15
Authors
6
Name
Order
Citations
PageRank
Zhoukai Wang101.01
Yinliang Zhao24519.54
Yang Liu32194188.81
Zhong Chen4317.12
Cuocuo Lv500.34
Yuxiang Li6196.37