A Speculative Parallel Optimization Method for Industrial Big Data Algorithms - Citegraph

Paper Info

Title
A Speculative Parallel Optimization Method for Industrial Big Data Algorithms

Abstract
Apache Spark is one of the most advanced distributed computing platforms at present, more and more industrial big data algorithms are implemented on this platform to achieve better performance. However, some of the algorithms cannot be executed on Apache Spark in parallel because of their complex inner control dependencies. In order to solve the problem, by introducing Software Thread-Level Speculation technique, this paper proposes a method to optimize industrial big data algorithms, and makes them run on Apache Spark in parallel. Specifically, the proposed method analyzes how complex inner dependencies affect algorithm's parallelism, then speculatively partitions the algorithm into subtasks to conquer these dependencies, and predicts inputs for the subtasks. After executing the subtasks in parallel, the results are collected and validated, the correct results are kept and committed while the incorrect ones are abandoned. By this way the optimal parallelism for industrial big data algorithms can be achieved. The experiments show that by the proposed method, the particle swarm optimization algorithm can achieve speedup by 150%-230% comparing with the unspeculative one. Therefore, the execution efficiency of low parallelized algorithm on Apache Spark can be markedly enhanced by the proposed optimization method.

Year	DOI	Venue
2019	10.1109/ICII.2019.00077	2019 IEEE International Conference on Industrial Internet (ICII)
Keywords	DocType	ISBN
parallel computing,Apache Spark,speculative computing,industrial big data algorithms	Conference	978-1-7281-2978-5
Citations	PageRank	References
0	0.34	6
Authors
3

Authors (3 rows)

Cited by (0 rows)

References (6 rows)

Name	Order	Citations	PageRank
Zhoukai Wang	1	0	0.34
Huaijun Wang	2	20	13.02
Junhuai Li	3	39	16.44

1