Title
Towards Building A Scalable Data Analytics System On Clouds: An Early Experience On Alicloud
Abstract
With the development of big data, big data processing systems, such as Hadoop and Spark, are widely used to handle large-scale data. To avoid the complexity and expensiveness of building a self-owned big data processing system, cloud providers tend to deploy big data processing tools as cloud services. Typical examples include Amazon EMR, Azure HDInsight and AliCloud E-MapReduce. However, how to build a cost-efficient system and scale the system is still challenging. In this paper, we have conducted a case study on AliCloud E-MapReduce, and analyzed the system performance upon local and remote file systems. We compared the scalability of Hadoop and Spark by using scaleout and scale-up strategies respectively. Based on the analysis results, we derive several observations and implications, which will contribute to guide the performance optimization.
Year
DOI
Venue
2018
10.1109/CLOUD.2018.00129
PROCEEDINGS 2018 IEEE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD)
Keywords
Field
DocType
scalability evaluation, cloud-based data processing, SaaS
Big data processing,Data science,Spark (mathematics),Task analysis,Data analysis,Computer science,Big data,Benchmark (computing),Distributed computing,Cloud computing,Scalability
Conference
Citations 
PageRank 
References 
0
0.34
0
Authors
7
Name
Order
Citations
PageRank
Congfeng Jiang1102.93
Wei Huang27824.31
Zujie Ren3898.14
Youhuizi Li472731.40
Jian Wan548356.15
Feng Cao651.84
Jiangbin Lin730.81