Title
Salvaging failing and straggling queries
Abstract
Interactive time responses are a crucial requirement for users analyzing large amounts of data, typically stored in a relational style data-warehouse where data is partitioned across thousands of nodes for high efficiency and throughput. However, consistently providing quick responses remains a big challenge for two reasons: (1) with data distributed across thousands of nodes, it is highly likely that some nodes are unavailable or are very slow during query execution and, (2) large number of users result in high resource contention which exacerbates the problem of slow and failing nodes. In such situations, systems typically straggle or fail the query resulting in higher latencies and wastage of resources. In this paper, we propose a novel solution to alleviate the failure/straggling problem: use the intermediate results from the partial query execution over available data, and exploit the statistical properties of efficiently partitioned data, particularly, co-hash partitioned data, to provide approximate answers along with confidence bounds. The proposed approach handles aggregate queries that involve joins, group bys, having clauses and a subclass of nested subqueries, covering a large portion of analytical queries. We validate our approach through extensive experiments on the TPC-H dataset and we observe that even with a low data availability of 1%, our proposed solution provides answers with less than 5% error.
Year
DOI
Venue
2022
10.1109/ICDE53745.2022.00108
2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022)
DocType
ISSN
Citations 
Conference
1084-4627
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Bruhathi Sundarmurthy100.34
Harshad Deshmukh200.34
Paraschos Koutris334726.63
Jeffrey F. Naughton400.34