PigOut: Making multiple Hadoop clusters work together - Citegraph

Paper Info

Title
PigOut: Making multiple Hadoop clusters work together

Abstract
This paper presents PigOut, a system that enables federated data processing over multiple Hadoop clusters. Using PigOut, a user (such as a data analyst) can write a single script in a high-level language to efficiently use multiple Hadoop clusters. There is no need to manually write multiple scripts and coordinate the execution for different clusters. PigOut accomplishes this by automatically partitioning a single, user-supplied script into multiple scripts that run on different clusters. Additionally, PigOut generates workflow descriptions to coordinate execution across clusters. In doing so, PigOut leverages existing tools built around Hadoop, avoiding extra effort required from users or administrators. For example, PigOut uses Pig Latin, a popular query language for Hadoop MapReduce, in a (virtually) unmodified form. Through our evaluation with PigMix, the standard benchmark for Pig, we demonstrate that PigOut's automatically-generated scripts and workflow definitions have comparable performance to manual, hand-tuned ones. We also report our experience with manually writing multiple scripts for a set of federated clusters, and compare the process with PigOut's automated approach.

Year	DOI	Venue
2014	10.1109/BigData.2014.7004218	BigData Conference
Keywords	Field	DocType
parallel processing,pattern clustering,pig latin,high-level language,query languages,pigmix,workflow descriptions,high level languages,pigout automatically-generated scripts,federated data processing,hadoop clusters,user-supplied script,data handling,query language,hadoop mapreduce	Data mining,Cluster (physics),Query language,Data processing,Computer science,Workflow,Database,Scripting language	Conference
ISSN	Citations	PageRank
2639-1589	2	0.40
References	Authors
10	6

Authors (6 rows)

Cited by (2 rows)

References (10 rows)

Name	Order	Citations	PageRank
Kyungho Jeon	1	78	5.68
Sharath Chandrashekhara	2	5	3.22
Feng Shen	3	9	1.28
Shikhar Mehra	4	2	0.40
Oliver Kennedy	5	3	0.77
Steven Y. Ko	6	471	45.08

1