Algorithmic Aspects of Parallel Data Processing. - Citegraph

Paper Info

Title
Algorithmic Aspects of Parallel Data Processing.

Abstract
In the last decade or so we have witnessed a growing interest in processing large data sets on large distributed clusters. The idea was pioneered by the MapReduce framework, and has been widely adopted by several other systems, including PigLatin, Hive, Scope, U-SQL, Dremmel, Spark and Myria. A large part of the complex data analysis performed by these systems consists of a sequence of relatively simple query operations, such as joining two or more tables. This survey discusses recent algorithmic developments for distributed data processing. It uses a theoretical model of parallel processing called the Massively Parallel Computation (MPC) model, which is a simplification of the BSP model where the only cost is given by the amount of communication and the number of communication rounds. The survey studies several algorithms for multi-join queries, for sorting, and for matrix multiplication, and discusses their relationships and common techniques applied across the different data processing tasks.

Year	DOI	Venue
2018	10.1561/1900000055	FOUNDATIONS AND TRENDS IN DATABASES
Keywords	Field	DocType
Databases,Parallel and Distributed Database Systems,Query Processing and Optimization	Data set,Data processing,Spark (mathematics),Data analysis,Computer science,myria-,Filter (signal processing),Theoretical computer science,Sorting,Matrix multiplication	Journal
Volume	Issue	ISSN
8	4	1931-7883
Citations	PageRank	References
1	0.36	0
Authors
3

Authors (3 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Paraschos Koutris	1	347	26.63
Semih Salihoglu	2	433	24.83
Dan Suciu	3	9625	1349.54

1