Title
QDrill: Query-Based Distributed Consumable Analytics for Big Data
Abstract
Consumable analytics attempt to address the shortage of skilled data analysts in many organizations by offering analytic functionality in a form more familiar to in-house expertise. Providing consumable analytics for Big Data faces three main challenges. The first challenge is making the analytics algorithms run in a distributed fashion in order to analyze Big Data in a timely manner. The second challenge is providing an easy interface to allow in-house expertise to run these algorithms in a distributed fashion while minimizing the learning cycle and existing code rewrites. The third challenge is running the analytics on data of different formats stored on heterogeneous data stores. In this paper, we address these challenges in the proposed QDrill. We introduce the Analytics Adaptor extension for Apache Drill, a schema-free SQL query engine for non-relational storage. The Analytics Adaptor introduces the Distributed Analytics Query Language for invoking data mining algorithms from within the Drill standard SQL query statements. The adaptor allows using any sequential single-node data mining library (e.g. WEKA) and makes its algorithms run in a distributed fashion without having to rewrite them. We evaluate QDrill against Apache Mahout. The evaluation shows that QDrill outperforms Mahout in Updatable model training and scoring phase while almost keeping the same performance for Non-Updatable model training. QDrill is more scalable and offers an easier interface, no storage overhead and the whole algorithms repository of WEKA, with the ability to extend to use algorithms from other data mining libraries.
Year
DOI
Venue
2016
10.1109/BigDataCongress.2016.23
2016 IEEE International Congress on Big Data (BigData Congress)
Keywords
Field
DocType
Big Data,Analytics,SQL,Data Mining,Distributed,Apache Drill,WEKA
SQL,Learning cycle,Data mining,Query language,Software analytics,Computer science,Semantic analytics,Analytics,Big data,Database,Scalability
Conference
ISSN
ISBN
Citations 
2379-7703
978-1-5090-2623-4
0
PageRank 
References 
Authors
0.34
14
5
Name
Order
Citations
PageRank
Shady Khalifa152.27
patrick martin214818.22
D. Rope371.98
Mike McRoberts450.92
Craig Statchuk583.40