Title
Low-Latency Analytics on Colossal Data Streams with SummaryStore.
Abstract
SummaryStore is an approximate time-series store, designed for analytics, capable of storing large volumes of time-series data (~1 petabyte) on a single node; it preserves high degrees of query accuracy and enables near real-time querying at unprecedented cost savings. SummaryStore contributes time-decayed summaries, a novel abstraction for summarizing data streams, along with an ingest algorithm to continually merge the summaries for efficient range queries; in conjunction, it returns reliable error estimates alongside the approximate answers, supporting a range of machine learning and analytical workloads. We successfully evaluated SummaryStore using real-world applications for forecasting, outlier detection, and Internet traffic monitoring; it can summarize aggressively with low median errors, 0.1 to 10%, for different workloads. Under range-query microbenchmarks, it stored 1PB synthetic stream data (10241TB streams), on a single node, using roughly 10 TB (100x compaction) with 95%-ile error below 5% and median cold-cache query latency of 1.3s (worst case latency under 70s).
Year
DOI
Venue
2017
10.1145/3132747.3132758
SOSP '17: ACM SIGOPS 26th Symposium on Operating Systems Principles Shanghai China October, 2017
Field
DocType
ISBN
Query optimization,Data mining,Anomaly detection,Data stream mining,Computer science,Petabyte,Range query (data structures),Real-time computing,Latency (engineering),Analytics,Internet traffic
Conference
978-1-4503-5085-3
Citations 
PageRank 
References 
4
0.38
30
Authors
2
Name
Order
Citations
PageRank
Nitin Agrawal199956.74
Ashish Vulimiri21878.44