Title
Algorithmic Aspects of Parallel Query Processing.
Abstract
In the last decade we have witnessed a growing interest in process- ing large data sets on large-scale distributed clusters. A big part of the complex data analysis pipelines performed by these systems consists of a sequence of relatively simple query operations, such as joining two or more tables, or sorting. This tutorial discusses several recent algorithmic developments for data processing in such large distributed clusters. It uses as a model of computation the Massively Parallel Computation (MPC) model, a simplification of the BSP model, where the only cost is given by the amount of communication and the number of communication rounds. Based on the MPC model, we study and analyze several algorithms for three core data processing tasks: multiway join queries, sorting and matrix multiplication. We discuss the common algorithmic techniques across all tasks, relate the algorithms to what is used in practical systems, and finally present open problems for future research.
Year
DOI
Venue
2018
10.1145/3183713.3197388
SIGMOD/PODS '18: International Conference on Management of Data Houston TX USA June, 2018
Keywords
Field
DocType
Distributed Query Evaluation,Bulk Synchronous Parallel Model
Data mining,Cluster (physics),Data set,Data processing,Computer science,Work in process,Parallel computing,Complex data type,Sorting,Model of computation,Matrix multiplication
Conference
ISSN
ISBN
Citations 
0730-8078
978-1-4503-4703-7
0
PageRank 
References 
Authors
0.34
21
3
Name
Order
Citations
PageRank
Paraschos Koutris134726.63
Semih Salihoglu243324.83
Dan Suciu396251349.54