Abstract | ||
---|---|---|
The number of application based on Apache Hadoop is increasing dramatically due to the robustness and dynamic features of this system. At the heart of Apache Hadoop, the Hadoop File System (HDFS) provides the reliability, scalability and high availability to computation by applying a static replication strategy. However, because of the characteristics of parallel operations on the application layer, the accessing frequency for each data file in HDFS is totally different. Consequently, maintaining the same replicating mechanism for every data file might lead to bad effects on the performance. By rigorously considering the drawbacks of HDFS architecture, this paper proposes an approach to dynamically replicate the data file based on the predictive analysis. With the help of probability theory, the utilization of each data file can be predicted to create an individual replication strategy. Eventually, the data file can subsequently be replicated depending on its own access potential. Hence, this approach simultaneously improves the data locality while keeping the analogous redundancy of data storage in comparison with the default replicating scheme. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1109/CBD.2015.19 | 2015 Third International Conference on Advanced Cloud and Big Data |
Keywords | Field | DocType |
Replication,HDFS,proactive prediction,Bayesian Learning,Gaussian Process | Application layer,File system,Computer science,Robustness (computer science),Redundancy (engineering),Data file,Big data,High availability,Operating system,Scalability | Conference |
ISBN | Citations | PageRank |
978-1-4673-8537-4 | 0 | 0.34 |
References | Authors | |
25 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Dinh-Mao Bui | 1 | 32 | 3.35 |
Thien Huynh-The | 2 | 29 | 4.08 |
Sungyoung Lee | 3 | 13 | 2.00 |
Bin Li | 4 | 318 | 30.27 |
Jin Wang | 5 | 26 | 4.23 |