Title
Optimal splitters for database partitioning with size bounds
Abstract
Partitioning is an important step in several database algorithms, including sorting, aggregation, and joins. Partitioning is also fundamental for dividing work into equal-sized (or balanced) parallel subtasks. In this paper, we aim to find, materialize and maintain a set of partitioning elements (splitters) for a data set. Unlike traditional partitioning elements, our splitters define both inequality and equality partitions, which allows us to bound the size of the inequality partitions. We provide an algorithm for determining an optimal set of splitters from a sorted data set and show that it has time complexity O(k lg2 N), where k is the number of splitters requested and N is the size of the data set. We show how the algorithm can be extended to pairs of tables, so that joins can be partitioned into work units that have balanced cost. We demonstrate experimentally (a) that finding the optimal set of splitters can be done efficiently, and (b) that using the precomputed splitters can improve the time to sort a data set by up to 76%, with particular benefits in the presence of a few heavy hitters.
Year
DOI
Venue
2009
10.1145/1514894.1514907
ICDT
Keywords
Field
DocType
traditional partitioning element,inequality partition,balanced cost,precomputed splitters,time complexity o,partitioning element,database algorithm,optimal splitters,work unit,k lg2,size bound,time complexity,data exchange,computer science,data integration
Data integration,Joins,Data exchange,Division (mathematics),Computer science,sort,Algorithm,Sorting,Theoretical computer science,Schema mapping,Time complexity,Database
Conference
Citations 
PageRank 
References 
11
1.03
25
Authors
2
Name
Order
Citations
PageRank
Kenneth A. Ross14110750.80
John Cieslewicz233519.95