Title
Investigation of Replication Factor for Performance Enhancement in the Hadoop Distributed File System.
Abstract
The massive growth in the volume of data and the demand for big data utilisation has led to an increasing prevalence of Hadoop Distributed File System (HDFS) solutions. However, the performance of Hadoop and indeed HDFS has some limitations and remains an open problem in the research community. The ultimate goal of our research is to develop an adaptive replication system; this paper presents the first phase of the work - an investigation into the replication factor used in HDFS to determine whether increasing the replication factor for in-demand data can improve the performance of the system. We constructed a physical Hadoop cluster for our experimental environment, using TestDFSIO and both the real world and the synthetic data sets, NOAA and TPC-H, with Hive to validate our proposal. Results show that increasing the replication factor of the »hot» data increases the availability and locality of the data, and thus, decreases the job execution time.
Year
Venue
Field
2018
ICPE Companion
Distributed File System,Locality,Open problem,Performance enhancement,Control engineering,Execution time,Engineering,Synthetic data sets,Big data,Distributed computing
DocType
ISBN
Citations 
Conference
978-1-4503-5629-9
0
PageRank 
References 
Authors
0.34
12
6