Title
Research on Data Storage and Processing Optimization Based on Federation HDFS and Spark.
Abstract
Hadoop and Spark provide undifferentiated services for data storage and processing, which can make it unable to meet on-demand services of different users or different types of data. Based on the above situation, this paper proposes a system architecture for data storage optimization based on Federation HDFS and Spark. According to Naive Bayes algorithm, the data of different types or different users received are divided. The divided results are stored in Federation HDFS with different backup policies and Spark is used to process data according to the priority at the same time. Based on the method described above, differential service can be realized and service quality can be improved. The experimental results show that the data storage and processing system architecture can provide different storage strategies and processing priorities for different priority data, which can also provide high fault tolerance and reduce data processing delay for high priority data.
Year
DOI
Venue
2018
10.1007/978-3-319-93659-8_97
COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS
Field
DocType
Volume
Data processing,Spark (mathematics),Naive Bayes classifier,Computer data storage,Computer science,Data type,Fault tolerance,Systems architecture,Backup,Distributed computing
Conference
772
ISSN
Citations 
PageRank 
2194-5357
0
0.34
References 
Authors
15
4
Name
Order
Citations
PageRank
Fangzhou Chen101.01
Peng Li212.43
He Xu34712.76
Wenkang Xie400.68