Title
A Unified Computation Engine for Big Data Analytics
Abstract
Nowadays large enterprises maintain a huge amount of data in multiple backend systems including traditional database systems and recently popular big data systems. In an example of telecom providers, the key business data (e.g., billing information) is maintained in database systems whereas the huge amount of log data is on HDFS with Hive. How to provide insightful analytics on such data becomes a challenging task. Traditional enterprise data warehouse systems with careful database design cannot meet the agile requirement of data scientists to arbitrarily access any useful data (such as the log data). In this paper, we propose a unified computation engine for big data analytics, namely Octopus, to effectively and efficiently bridge data scientists and data warehouse. First, Octopus designs a SQL-alike approach to unify both database queries and machine learning algorithms. Next, Octopus optimizes the running time of such big data analytic tasks by scheduling optimal subtasks to backend systems. A proof-of-concept prototype of Octopus successfully verifies that Octopus can achieve much faster running time than Spark. For example, Octopus outperforms the recent Spark 1.4.0 by 4.58× faster running time to process a complex analytic task, and 5.25× to process a simple aggregation query.
Year
DOI
Venue
2015
10.1109/BDC.2015.41
2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC)
Keywords
DocType
Citations 
Apache Spark,Big Data,Hybrid Computation Engine
Conference
2
PageRank 
References 
Authors
0.37
0
6
Name
Order
Citations
PageRank
Chenyang Xu158523.07
Yanjie Chen220.37
Qin Liu320.71
Weixiong Rao420327.25
Hong Min5625.42
Gong Su629142.46