A Unified Computation Engine for Big Data Analytics - Citegraph

Paper Info

Title
A Unified Computation Engine for Big Data Analytics

Abstract
Nowadays large enterprises maintain a huge amount of data in multiple backend systems including traditional database systems and recently popular big data systems. In an example of telecom providers, the key business data (e.g., billing information) is maintained in database systems whereas the huge amount of log data is on HDFS with Hive. How to provide insightful analytics on such data becomes a challenging task. Traditional enterprise data warehouse systems with careful database design cannot meet the agile requirement of data scientists to arbitrarily access any useful data (such as the log data). In this paper, we propose a unified computation engine for big data analytics, namely Octopus, to effectively and efficiently bridge data scientists and data warehouse. First, Octopus designs a SQL-alike approach to unify both database queries and machine learning algorithms. Next, Octopus optimizes the running time of such big data analytic tasks by scheduling optimal subtasks to backend systems. A proof-of-concept prototype of Octopus successfully verifies that Octopus can achieve much faster running time than Spark. For example, Octopus outperforms the recent Spark 1.4.0 by 4.58× faster running time to process a complex analytic task, and 5.25× to process a simple aggregation query.

Year	DOI	Venue
2015	10.1109/BDC.2015.41	2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC)
Keywords	DocType	Citations
Apache Spark,Big Data,Hybrid Computation Engine	Conference	2
PageRank	References	Authors
0.37	0	6

Authors (6 rows)

Cited by (2 rows)

References (0 rows)

Name	Order	Citations	PageRank
Chenyang Xu	1	585	23.07
Yanjie Chen	2	2	0.37
Qin Liu	3	2	0.71
Weixiong Rao	4	203	27.25
Hong Min	5	62	5.42
Gong Su	6	291	42.46

1