Equi-join for Multiple Datasets Based on Time Cost Evaluation Model. - Citegraph

Paper Info

Title
Equi-join for Multiple Datasets Based on Time Cost Evaluation Model.

Abstract
MapReduce is an important programming model for processing big data with a parallel, distributed algorithm on a cluster. In big data analytic application, equi-join is an important operation. However, it is inefficient to perform equi-join operations in MapReduce when multiple datasets are involved in the join. In this paper, a time cost evaluation model is extended for an equi-join by considering the time cost of calculation. In addition, the sub-joins in an equi-join are classified into star pattern sub-joins on single attribute and chain pattern sub-joins. Based on the extended model, optimization methods are presented and an equi-join plan with lower time cost is chosen for the equi-join. The optimization methods include: the star pattern sub-joins on one attribute are first processed; next, a chain pattern sub-join with minimal scale of intermediate results (i.e. the number of tuples in intermediate results) is processed; at last, a chain pattern sub-join is decomposed into several MapReduce jobs or single MapReduce job by dynamic programming to obtain an optimal scheme for the chain pattern sub-join. We conducted extensive experiments, and the results show that our method is more efficient than those methods such as MDMJ, Hive and Pig.

Year	DOI	Venue
2014	10.1007/978-3-319-11194-0_10	ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2014, PT II
Keywords	Field	DocType
Join,MapReduce,Dynamic Programming	Dynamic programming,Programming paradigm,Tuple,Computer science,Parallel computing,Cost evaluation,Distributed algorithm,Big data	Conference
Volume	ISSN	Citations
8631	0302-9743	0
PageRank	References	Authors
0.34	9	4

Authors (4 rows)

Cited by (0 rows)

References (9 rows)

Name	Order	Citations	PageRank
Hong Zhu	1	47	7.49
Libo Xia	2	0	0.34
Hong Zhu	3	7	5.52
Ke Yan	4	3	1.75

1