Abstract | ||
---|---|---|
Consumable analytics attempt to address the shortage of skilled data analysts in many organizations by offering analytic functionality in a form more familiar to in-house expertise. Providing consumable analytics for Big Data faces three main challenges. The first challenge is making the analytics algorithms run in a distributed fashion in order to analyze Big Data in a timely manner. The second challenge is providing an easy interface to allow in-house expertise to run these algorithms in a distributed fashion while minimizing the learning cycle and existing code rewrites. The third challenge is running the analytics on data of different formats stored on heterogeneous data stores. In this paper, we address these challenges in the proposed QDrill. We introduce the Analytics Adaptor extension for Apache Drill, a schema-free SQL query engine for non-relational storage. The Analytics Adaptor introduces the Distributed Analytics Query Language for invoking data mining algorithms from within the Drill standard SQL query statements. The adaptor allows using any sequential single-node data mining library (e.g. WEKA) and makes its algorithms run in a distributed fashion without having to rewrite them. We evaluate QDrill against Apache Mahout. The evaluation shows that QDrill outperforms Mahout in Updatable model training and scoring phase while almost keeping the same performance for Non-Updatable model training. QDrill is more scalable and offers an easier interface, no storage overhead and the whole algorithms repository of WEKA, with the ability to extend to use algorithms from other data mining libraries. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1109/BigDataCongress.2016.23 | 2016 IEEE International Congress on Big Data (BigData Congress) |
Keywords | Field | DocType |
Big Data,Analytics,SQL,Data Mining,Distributed,Apache Drill,WEKA | SQL,Learning cycle,Data mining,Query language,Software analytics,Computer science,Semantic analytics,Analytics,Big data,Database,Scalability | Conference |
ISSN | ISBN | Citations |
2379-7703 | 978-1-5090-2623-4 | 0 |
PageRank | References | Authors |
0.34 | 14 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shady Khalifa | 1 | 5 | 2.27 |
patrick martin | 2 | 148 | 18.22 |
D. Rope | 3 | 7 | 1.98 |
Mike McRoberts | 4 | 5 | 0.92 |
Craig Statchuk | 5 | 8 | 3.40 |