Abstract | ||
---|---|---|
HDFS (Hadoop Distributed File System), as a part of data stored in the Hadoop ecosystem, provides read and write interfaces for many upper-level applications. The read/write performance of HDFS is affected by hardware such as disk, network, and even CPU and memory. The underlying storage system and transmission network of HDFS use high-performance devices, the read/write performance will be improved to a certain extent. However, due to the influence of the complex software stack, the improvement ratio cannot reach the device's own performance's lift ratio. HDFS can use cheap machines to store petabytes of data, equipped with ultra-high-performance hardware devices to improve the performance of HDFS will increase economic expenses and waste resources. In this paper, we analyze the read/write process of HDFS, determine the proportion of software and hardware processes. According to the test environment and methods in this paper, we find that the impact of the storage system on HDFS accounts for 19.7%, and the network accounts for 62.5%. We test the basic performance of various hardware and its application to HDFS, combine hardware utilization analysis, we find that the use of popular storage systems and the networks can improve the write performance of HDFS by 257% and 207%, respectively. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1007/s10586-022-03597-0 | Cluster Computing |
Keywords | DocType | Volume |
HDFS, Read and write process, Performance test, SSD, InfiniBand | Journal | 25 |
Issue | ISSN | Citations |
5 | 1386-7857 | 0 |
PageRank | References | Authors |
0.34 | 1 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Liu Yun | 1 | 0 | 0.34 |
Zhang Xiao | 2 | 1 | 2.40 |
Liu Binbin | 3 | 0 | 0.34 |
Xiaonan Zhao | 4 | 1 | 2.06 |