Abstract | ||
---|---|---|
Apache Spark is one of the most advanced distributed computing platforms at present, more and more industrial big data algorithms are implemented on this platform to achieve better performance. However, some of the algorithms cannot be executed on Apache Spark in parallel because of their complex inner control dependencies. In order to solve the problem, by introducing Software Thread-Level Speculation technique, this paper proposes a method to optimize industrial big data algorithms, and makes them run on Apache Spark in parallel. Specifically, the proposed method analyzes how complex inner dependencies affect algorithm's parallelism, then speculatively partitions the algorithm into subtasks to conquer these dependencies, and predicts inputs for the subtasks. After executing the subtasks in parallel, the results are collected and validated, the correct results are kept and committed while the incorrect ones are abandoned. By this way the optimal parallelism for industrial big data algorithms can be achieved. The experiments show that by the proposed method, the particle swarm optimization algorithm can achieve speedup by 150%-230% comparing with the unspeculative one. Therefore, the execution efficiency of low parallelized algorithm on Apache Spark can be markedly enhanced by the proposed optimization method. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ICII.2019.00077 | 2019 IEEE International Conference on Industrial Internet (ICII) |
Keywords | DocType | ISBN |
parallel computing,Apache Spark,speculative computing,industrial big data algorithms | Conference | 978-1-7281-2978-5 |
Citations | PageRank | References |
0 | 0.34 | 6 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhoukai Wang | 1 | 0 | 0.34 |
Huaijun Wang | 2 | 20 | 13.02 |
Junhuai Li | 3 | 39 | 16.44 |