Title
External Merge Sort for Top-K Queries: Eager input filtering guided by histograms
Abstract
Business intelligence and web log analysis workloads often use queries with top-k clauses to produce the most relevant results. Values ofk range from small to rather large and sometimes the requested output exceeds the capacity of the available main memory. When the requested output fits in the available memory existing top-k algorithms are efficient, as they can eliminate almost all but the topk results before sorting them. When the requested output exceeds the main memory capacity, existing algorithms externally sort the entire input, which can be very expensive. Furthermore, the drastic difference in execution cost when the memory capacity is exceeded results in an unpleasant user experience. Every day, tens of thousands of production top-k queries executed on F1 Query resort to an external sort of the input. To address these challenges, we introduce a new top-k algorithm that is able to eliminate parts of the input before sorting or writing them to secondary storage, regardless of whether the requested output fits in the available memory. To achieve this, at execution time our algorithm creates a concise model of the input using histograms. The proposed algorithm is implemented as part of F1 Query and is used in production, where significantly accelerates top-k queries with outputs larger than the available memory. We evaluate our algorithm against existing top-k algorithms and show that it reduces I/O traffic and can be up to 11 times faster.
Year
DOI
Venue
2020
10.1145/3318464.3389729
SIGMOD/PODS '20: International Conference on Management of Data Portland OR USA June, 2020
Keywords
DocType
ISBN
Top-K, Query Operators, Out-of-core
Conference
978-1-4503-6735-6
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Yannis Chronis131.72
Thanh Do201.35
Goetz Graefe32972716.84
Keith Peters400.68