Abstract | ||
---|---|---|
Networked observational devices and remote sensing equipment continue to proliferate and contribute to the accumulation of extreme-scale datasets. Both the rate and resolution of the readings produced by these devices have grown over time, exacerbating the issues surrounding their storage and management. In many cases, the sheer scale of the information being maintained makes timely analysis infeasible due to the computational workloads required to process the data. While distributed solutions provide a scalable way to cope with data volumes, the communication and latency involved when inspecting large portions of an overall dataset limit applications that require frequent or rapid responses to incoming queries. This study investigates the challenges associated with providing approximate or exploratory answers to distributed queries. In many situations, this requires striking a balance between response times and error rates to produce meaningful results. To enable these use cases, we outline several expressive query constructs and describe their implementation; rather than relying on summary tables or pre-computed samples, our solution involves a coarse-grained global index that maintains statistics and models the relationships across dimensions in the dataset. To illustrate the benefits of these techniques, we include performance benchmarks on a real-world dataset in a production environment. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1109/TCC.2015.2398437 | IEEE Trans. Cloud Computing |
Keywords | DocType | Volume |
approximate query processing,ad hoc exploration,multidimensional data | Journal | PP |
Issue | ISSN | Citations |
99 | 2168-7161 | 8 |
PageRank | References | Authors |
0.55 | 21 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Malensek, M. | 1 | 9 | 0.91 |
Sangmi Lee Pallickara | 2 | 170 | 24.46 |
Sangmi Lee Pallickara | 3 | 170 | 24.46 |