Title
Multi-core, main-memory joins: sort vs. hash revisited
Abstract
In this paper we experimentally study the performance of main-memory, parallel, multi-core join algorithms, focusing on sort-merge and (radix-)hash join. The relative performance of these two join approaches have been a topic of discussion for a long time. With the advent of modern multi-core architectures, it has been argued that sort-merge join is now a better choice than radix-hash join. This claim is justified based on the width of SIMD instructions (sort-merge outperforms radix-hash join once SIMD is sufficiently wide), and NUMA awareness (sort-merge is superior to hash join in NUMA architectures). We conduct extensive experiments on the original and optimized versions of these algorithms. The experiments show that, contrary to these claims, radix-hash join is still clearly superior, and sort-merge approaches to performance of radix only when very large amounts of data are involved. The paper also provides the fastest implementations of these algorithms, and covers many aspects of modern hardware architectures relevant not only for joins but for any parallel data processing operator.
Year
DOI
Venue
2013
10.14778/2732219.2732227
VLDB
Field
DocType
Volume
Hash join,Joins,Recursive join,Computer science,sort,Parallel computing,SIMD,Hash function,Operator (computer programming),Multi-core processor,Database
Journal
7
Issue
ISSN
Citations 
1
2150-8097
89
PageRank 
References 
Authors
2.21
21
4
Name
Order
Citations
PageRank
Cagri Balkesen11685.70
Gustavo Alonso25476612.79
Jens Teubner3146487.09
M. Tamer Özsu44504582.63