Efficient Data Redistribution to Speedup Big Data Analytics in Large Systems - Citegraph

Paper Info

Title
Efficient Data Redistribution to Speedup Big Data Analytics in Large Systems

Abstract
The performance of parallel data analytics systems becomes increasingly important with the rise of Big Data. An essential operation in such environment is parallel join, which always incurs significant cost on network communication. State-of-the-art approaches have achieved performance improvements over conventional implementations through minimizing network traffic or communication time. However, these approaches still face performance issues in the presence of big data and/or large-scale systems, due to their heavy overhead of data redistribution scheduling. In this paper, we propose near-join, a network-aware redistribution approach targeting to efficiently reduce both network traffic and communication time of join executions. Particularly, near-join is lightweight and adaptable to processing large datasets over large systems. We present the details of our algorithm and its implementation. The experiments performed on a cluster of up to 400 nodes and datasets of about 100GB have demonstrated that our scheduling algorithm is much faster than the state-of-the-art methods. Moreover, our join implementation can also achieve speedups over the conventional approaches.

Year	DOI	Venue
2016	10.1109/HiPC.2016.020	2016 IEEE 23rd International Conference on High Performance Computing (HiPC)
Keywords	Field	DocType
data analytics,parallel joins,data locality,data-intensive computing,high performance computing	Data analysis,Data-intensive computing,Supercomputer,Scheduling (computing),Computer science,Parallel computing,Implementation,Redistribution (cultural anthropology),Big data,Distributed computing,Speedup	Conference
ISSN	ISBN	Citations
1094-7256	978-1-5090-5412-1	0
PageRank	References	Authors
0.34	16	2

Authors (2 rows)

Cited by (0 rows)

References (16 rows)

Name	Order	Citations	PageRank
Long Cheng	1	91	16.99
Tao Li	2	143	54.36

1