Title
A Novel Compression Algorithm Decision Method For Spark Shuffle Process
Abstract
With the wide application of Spark big data platform, some problems in practical application are exposed, and one of the main problems is performance optimization. The Shuffle module of Spark is one of the core modules of Spark, and it is also an important module of some other distributed big data computing frameworks. The design of Shuffle module is the key factor that directly determines the performance of big data computing framework. The main optimization parameters of Shuffle process involve the CPU utilization, I/O literacy rate, network transmission rate, and one of these factors is likely to be the bottleneck during the execution of application. The network data transmission time consumption, I/O read and write time, and the CPU utilization are closely related with the size of the data processing. As a result, Spark provides compression configuration options and different compression algorithms for users to select. Different compression algorithms have different effects in compression rate and compression ratio, but the default configuration is usually selected by all users even though they run different applications, so the optimal configuration cannot be achieved. In order to achieve the optimal configuration of compression algorithm for the Shuffle process, one cost optimization model for Spark Shuffle process is proposed in this paper, which enables users to get the best compression configuration before application execution. The experimental results show that the prediction model for compression configuration has an accuracy of 58.3%, and the proposed cost optimization model can improve the performance by 48.9%.
Year
DOI
Venue
2017
10.1109/BigData.2017.8258262
2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)
Keywords
DocType
ISSN
Spark, Shuffle process, compression configuration, cost model
Conference
2639-1589
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Shanshan Huang1338.82
Jungang Xu213.39
Renfeng Liu311.36
Husheng Liao42011.82