Towards Building A Scalable Data Analytics System On Clouds: An Early Experience On Alicloud - Citegraph

Paper Info

Title
Towards Building A Scalable Data Analytics System On Clouds: An Early Experience On Alicloud

Abstract
With the development of big data, big data processing systems, such as Hadoop and Spark, are widely used to handle large-scale data. To avoid the complexity and expensiveness of building a self-owned big data processing system, cloud providers tend to deploy big data processing tools as cloud services. Typical examples include Amazon EMR, Azure HDInsight and AliCloud E-MapReduce. However, how to build a cost-efficient system and scale the system is still challenging. In this paper, we have conducted a case study on AliCloud E-MapReduce, and analyzed the system performance upon local and remote file systems. We compared the scalability of Hadoop and Spark by using scaleout and scale-up strategies respectively. Based on the analysis results, we derive several observations and implications, which will contribute to guide the performance optimization.

Year	DOI	Venue
2018	10.1109/CLOUD.2018.00129	PROCEEDINGS 2018 IEEE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD)
Keywords	Field	DocType
scalability evaluation, cloud-based data processing, SaaS	Big data processing,Data science,Spark (mathematics),Task analysis,Data analysis,Computer science,Big data,Benchmark (computing),Distributed computing,Cloud computing,Scalability	Conference
Citations	PageRank	References
0	0.34	0
Authors
7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Congfeng Jiang	1	10	2.93
Wei Huang	2	78	24.31
Zujie Ren	3	89	8.14
Youhuizi Li	4	727	31.40
Jian Wan	5	483	56.15
Feng Cao	6	5	1.84
Jiangbin Lin	7	3	0.81

1