Title
Robust and Skew-resistant Parallel Joins in Shared-Nothing Systems
Abstract
The performance of joins in parallel database management systems is critical for data intensive operations such as querying. Since data skew is common in many applications, poorly engineered join operations result in load imbalance and performance bottlenecks. State-of-the-art methods designed to handle this problem offer significant improvements over naive implementations. However, performance could be further improved by removing the dependency on global skew knowledge and broadcasting. In this paper, we propose PRPQ (partial redistribution & partial query), an efficient and robust join algorithm for processing large-scale joins over distributed systems. We present the detailed implementation and a quantitative evaluation of our method. The experimental results demonstrate that the proposed PRPQ algorithm is indeed robust and scalable under a wide range of skew conditions. Specifically, compared to the state-of-art PRPD method, we achieve 16% - 167% performance improvement and 24% - 54% less network communication under different join workloads.
Year
DOI
Venue
2014
10.1145/2661829.2661888
CIKM
Keywords
Field
DocType
distributed joins,systems,parallel joins,performance,prpd,prpq
Broadcasting,Joins,Computer science,Parallel database,Shared nothing architecture,Implementation,Skew,Performance improvement,Scalability,Distributed computing
Conference
Citations 
PageRank 
References 
16
0.62
24
Authors
4
Name
Order
Citations
PageRank
Long Cheng19116.99
Spyros Kotoulas259046.46
Tomas E. Ward310419.10
Georgios Theodoropoulos433231.39