Title
Too Big to Eat: Boosting Analytics Data Ingestion from Object Stores with Scoop
Abstract
Extracting value from data stored in object stores,such as OpenStack Swift and Amazon S3, can be problematicin common scenarios where analytics frameworks and objectstores run in physically disaggregated clusters. One of the mainproblems is that analytics frameworks must ingest large amountsof data from the object store prior to the actual computation;this incurs a significant resources and performance overhead. Toovercome this problem, we present Scoop. Scoop enables analyticsframeworks to benefit from the computational resources of objectstores to optimize the execution of analytics jobs. Scoop achievesthis by enabling the addition of ETL-type actions to the dataupload path and by offloading querying functions to the objectstore through a rich and extensible active object storage layer. Asa proof-of-concept, Scoop enables Apache Spark SQL selectionsand projections to be executed close to the data in OpenStackSwift for accelerating analytics workloads of a smart energy gridcompany (GridPocket). Our experiments in a 63-machine clusterwith real IoT data and SQL queries from GridPocket show thatScoop exhibits query execution times up to 30x faster than thetraditional “ingest-then-compute” approach.
Year
DOI
Venue
2017
10.1109/ICDE.2017.243
2017 IEEE 33rd International Conference on Data Engineering (ICDE)
Keywords
Field
DocType
boosting analytics,data ingestion,object stores,OpenStack Swift,Amazon S3,Scoop,ETL-type actions,querying functions,Apache Spark SQL selections,IoT data,SQL queries,GridPocket
Active object,SQL,Data mining,Spark (mathematics),Swift,Computer science,SCOOP,Boosting (machine learning),Analytics,Big data,Database
Conference
ISSN
ISBN
Citations 
1084-4627
978-1-5090-6544-8
1
PageRank 
References 
Authors
0.37
22
13
Name
Order
Citations
PageRank
Yosef Moatti1654.35
Eran Rom2333.95
Raul Gracia-Tinedo3726.26
Dalit Naor41084105.18
Doron Chen5232.58
Josep Sampé6233.62
Marc Sánchez Artigas710417.46
Pedro García-López853645.60
Filip Gluszak910.37
Eric Deschdt1010.37
Francesco Pace1142.13
Daniele Venzano1222116.42
Pietro Michiardi131512111.53