Title
The research and analysis of efficiency of hardware usage base on HDFS
Abstract
HDFS (Hadoop Distributed File System), as a part of data stored in the Hadoop ecosystem, provides read and write interfaces for many upper-level applications. The read/write performance of HDFS is affected by hardware such as disk, network, and even CPU and memory. The underlying storage system and transmission network of HDFS use high-performance devices, the read/write performance will be improved to a certain extent. However, due to the influence of the complex software stack, the improvement ratio cannot reach the device's own performance's lift ratio. HDFS can use cheap machines to store petabytes of data, equipped with ultra-high-performance hardware devices to improve the performance of HDFS will increase economic expenses and waste resources. In this paper, we analyze the read/write process of HDFS, determine the proportion of software and hardware processes. According to the test environment and methods in this paper, we find that the impact of the storage system on HDFS accounts for 19.7%, and the network accounts for 62.5%. We test the basic performance of various hardware and its application to HDFS, combine hardware utilization analysis, we find that the use of popular storage systems and the networks can improve the write performance of HDFS by 257% and 207%, respectively.
Year
DOI
Venue
2022
10.1007/s10586-022-03597-0
Cluster Computing
Keywords
DocType
Volume
HDFS, Read and write process, Performance test, SSD, InfiniBand
Journal
25
Issue
ISSN
Citations 
5
1386-7857
0
PageRank 
References 
Authors
0.34
1
4
Name
Order
Citations
PageRank
Liu Yun100.34
Zhang Xiao212.40
Liu Binbin300.34
Xiaonan Zhao412.06