Title
Correlation-aware partitioning for skewed range query optimization.
Abstract
Data partitioning is an effective way to reduce cost and improve query performance in large-scale Web data analytical applications. State-of-the-art partitioning approaches on range queries is lacking of considering the correlation of data in a certain access patterns, especially in some skewed patterns. This paper presents a correlation-aware partitioning model for skewed range queries. It formulates partitioning optimization issue on continuous correlated data as a geometrical step curve fitting problem. Then, we prove that the optimal partitioning should split data on range query boundaries. On this basis, Range Boundary Based DP Partitioning is designed to induce the optimal partition and significantly reduce the computation cost compared to the baseline dynamic programming algorithm. Local is better than global. For efficiency, Bottom-up Merging Partitioning is proposed further to improve partitioning by bottom-up merging instead of searching. To evaluate the proposed approaches, sets of experiments are conducted under skewed range query workloads on skewed and uniform datasets, and show they do optimize the efficiency of data partitioning by hundreds of times.
Year
DOI
Venue
2019
10.1007/s11280-018-0547-4
World Wide Web
Keywords
Field
DocType
Range query, Data skew, Partitioning algorithm, Correlation aware
Data mining,Dynamic programming,Curve fitting,Computer science,Range query (data structures),Correlation,Merge (version control),Partition (number theory),Data partitioning,Computation
Journal
Volume
Issue
ISSN
22
1
1573-1413
Citations 
PageRank 
References 
0
0.34
20
Authors
4
Name
Order
Citations
PageRank
Wei Ge12111.72
Xianxian Li211521.21
Chunfeng Yuan341830.84
Huang, Yihua416722.07