Title
A Speculative Parallel Optimization Method for Industrial Big Data Algorithms
Abstract
Apache Spark is one of the most advanced distributed computing platforms at present, more and more industrial big data algorithms are implemented on this platform to achieve better performance. However, some of the algorithms cannot be executed on Apache Spark in parallel because of their complex inner control dependencies. In order to solve the problem, by introducing Software Thread-Level Speculation technique, this paper proposes a method to optimize industrial big data algorithms, and makes them run on Apache Spark in parallel. Specifically, the proposed method analyzes how complex inner dependencies affect algorithm's parallelism, then speculatively partitions the algorithm into subtasks to conquer these dependencies, and predicts inputs for the subtasks. After executing the subtasks in parallel, the results are collected and validated, the correct results are kept and committed while the incorrect ones are abandoned. By this way the optimal parallelism for industrial big data algorithms can be achieved. The experiments show that by the proposed method, the particle swarm optimization algorithm can achieve speedup by 150%-230% comparing with the unspeculative one. Therefore, the execution efficiency of low parallelized algorithm on Apache Spark can be markedly enhanced by the proposed optimization method.
Year
DOI
Venue
2019
10.1109/ICII.2019.00077
2019 IEEE International Conference on Industrial Internet (ICII)
Keywords
DocType
ISBN
parallel computing,Apache Spark,speculative computing,industrial big data algorithms
Conference
978-1-7281-2978-5
Citations 
PageRank 
References 
0
0.34
6
Authors
3
Name
Order
Citations
PageRank
Zhoukai Wang100.34
Huaijun Wang22013.02
Junhuai Li33916.44