Efficient Data Blocking and Skipping Framework Applying Heuristic Rules - Citegraph

Paper Info

Title
Efficient Data Blocking and Skipping Framework Applying Heuristic Rules

Abstract
Data blocking has been an effective technique of data skipping to reduce data access and shorten query response time in query engines. By generating fine-grained, balanced blocks and corresponding metadata, a query may skip a block if the metadata indicates that the block does not contain relevant data. Obviously, the deciding factor of a promising blocking strategy depends on how to produce effective data layout in reasonable time that is expected to skip most data. In this paper, we propose several algorithms that drastically reduce the time complexity of existent blocking strategies based on workload analysis, at the cost of relatively small loss of estimated tuples could be skipped. Via theoretical analysis, we prove that the time complexity of our algorithms is apparently lower than that of ward algorithm. Afterwards, we demonstrate the whole blocking and skipping workflow, install it into Spark SQL and obtain experimental evaluation results. Experimental results show that our technique gains significant improvement in aspect of blocking efficiency compared to ward algorithm, while keeping almost the same level of skipping ability.

Year	DOI	Venue
2017	10.1109/ICPADS.2017.00037	2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS)
Keywords	Field	DocType
data blocking,data skipping,workload,metadata,query response time,Spark SQL	Data warehouse,Metadata,Heuristic,Spark (mathematics),Tuple,Computer science,Algorithm,Time complexity,Cluster analysis,Data access,Distributed computing	Conference
ISSN	ISBN	Citations
1521-9097	978-1-5386-3208-6	0
PageRank	References	Authors
0.34	0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yong Wang	1	275	92.19
Xiao-Chun Yun	2	215	41.96
Xi Wang	3	4	1.08
Shupeng Wang	4	59	19.97
Yongshang Wu	5	0	1.01

1