PoBery: Possibly-complete Big Data Queries with Probabilistic Data Placement and Scanning - Citegraph

Paper Info

Title
PoBery: Possibly-complete Big Data Queries with Probabilistic Data Placement and Scanning

Abstract
AbstractIn big data query processing, there is a trade-off between query accuracy and query efficiency, for example, sampling query approaches trade-off query completeness for efficiency. In this article, we argue that query performance can be significantly improved by slightly losing the possibility of query completeness, that is, the chance that a query is complete. To quantify the possibility, we define a new concept, Probability of query Completeness (hereinafter referred to as PC). For example, If a query is executed 100 times, PC = 0.95 guarantees that there are no more than 5 incomplete results among 100 results. Leveraging the probabilistic data placement and scanning, we trade off PC for query performance. In the article, we propose PoBery (POssibly-complete Big data quERY), a method that supports neither complete queries nor incomplete queries, but possibly-complete queries. The experimental results conducted on HiBench prove that PoBery can significantly accelerate queries while ensuring the PC. Specifically, it is guaranteed that the percentage of complete queries is larger than the given PC confidence. Through comparison with state-of-the-art key-value stores, we show that while Drill-based PoBery performs as fast as Drill on complete queries, it is 1.7 ×, 1.1 ×, and 1.5 × faster on average than Drill, Impala, and Hive, respectively, on possibly-complete queries.

Year	DOI	Venue
2021	10.1145/3465375	ACM/IMS Transactions on Data Science
DocType	Volume	Issue
Journal	2	3
ISSN	Citations	PageRank
2691-1922	0	0.34
References	Authors
0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Jie Song	1	0	0.34
Qiang He	2	204	18.15
Feifei Chen	3	296	22.56
Ye Yuan	4	82	6.46
Ye Yuan	5	117	24.40

1