Title
Towards Timely, Resource-Efficient Analyses Through Spatially-Aware Constructs within Spark
Abstract
Across several domains there has been a substantial growth in data volumes. A majority of the generated data are geotagged. This data includes a wealth of information that can inform insights, planning, and decision-making. The proliferation of open-source analytical engines has democratized access to tools and processing frameworks to analyze data. However, several of the analytical engines do not include streamlined support for spatial data wrangling and processing. Here, we present our language-agnostic methodology for effective analyses over voluminous spatiotemporal datasets using Spark. In particular, we introduce support for spatial data processing within the foundational constructs underpinning development of Spark programs DataFrames, Datasets, and RDDs. Our empirical benchmarks demonstrate the suitability of our methodology; in contrast to alternative distribution spatial analytics frameworks, we achieve over 2x speed-up for spatial range queries. Our methodology also makes effective utilization of resources by reducing disk I/O by a factor of 18, network I/O by 5 orders of magnitude, and peak memory utilization by 58% for the same set of analytic tasks.
Year
DOI
Venue
2020
10.1109/UCC48980.2020.00024
2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC)
Keywords
DocType
ISSN
Spatial Analytics,Data Wrangling,Analytical Engines
Conference
2373-6860
ISBN
Citations 
PageRank 
978-1-6654-1563-7
0
0.34
References 
Authors
22
3
Name
Order
Citations
PageRank
Daniel Rammer173.16
Sangmi Lee Pallickara217024.46
Shrideep Pallickara383792.72