Title
STASH : Fast Hierarchical Aggregation Queries for Effective Visual Spatiotemporal Explorations
Abstract
The proliferation of sensors and observational instruments enable scientists to explore natural, spatiotemporal phenomena via explorative analysis and advanced modeling. Geospatial visualization, in particular, is an intuitive tool to identify patterns, enhance understanding of the data, and plan for subsequent analysis. However, seamless interactions between end-user devices and the sheer volume of data have been a challenge due to the limited bandwidth and data access latencies.In this paper, we introduce Stash, a distributed, in-memory cache for hierarchical aggregation and query evaluations. Stash is a middleware which can be loaded on top of a distributed file system. Users perform queries from a lightweight visualization interface at the front-end and the evaluations occur over the back-end storage system housing the raw data over which summarization and subsequent visualizations are to be performed. Stash facilitates fast exploratory analytics by caching relevant past query results based on their frequency and freshness to assist similar, future queries and avoid expensive disk I/O and network usage, thus reducing their latency. Additionally, Stash handles any hotspot that might result from a spike in user requests due to the spatial and temporal locality of their access patterns.Our empirical benchmarks show that a Stash-enabled system reduces query latency of a basic system by over 5-folds and brings it down to interactive speed even for large country-sized spatiotemporal queries. We have contrasted Stash with existing cache-enabled analytics engines, such as ElasticSearch, and found that our STASH-enabled system reduced the aggregation query latency up to ~70%. STASH also alleviated skewed workloads through its dynamic replication scheme and improved throughput by ~40% in hotspot scenarios.
Year
DOI
Venue
2019
10.1109/CLUSTER.2019.8891029
2019 IEEE International Conference on Cluster Computing (CLUSTER)
Keywords
Field
DocType
in-memory storage,distributed caching,aggregation query,visual analytics,exploratory analytics
Distributed File System,Automatic summarization,Data visualization,Locality of reference,Cache,Computer science,Visualization,Analytics,Data access,Distributed computing
Conference
ISSN
ISBN
Citations 
1552-5244
978-1-7281-4735-2
1
PageRank 
References 
Authors
0.36
21
4
Name
Order
Citations
PageRank
Saptashwa Mitra111.71
Paahuni Khandelwal221.73
Shrideep Pallickara383792.72
Sangmi Lee Pallickara417024.46