Title
MEWSE: multi-engine workflow submission and execution on apache YARN.
Abstract
In this era of Big Data, organizations are attempting to gain new insights from the data by performing more complex analyses often involving multiple data sets and multiple analytic techniques. Creating a workflow to perform theses analyses can involve several analytic engines and require multiple transformations of the data as it flows through the steps of the task. Coordinating the jobs involved in a workflow is handled by a workflow management system. Current workflow systems, however, typically run only on one engine and do not offer the versatility required by analytic workflows. On the other hand, the process of submitting the jobs on different engines manually can be time consuming and requires the expertise of working with the various analytic engines. In this paper we present MEWSE - Multi Engine Workflow Submission and Execution on Apache YARN. Users submit multi-engine analytic workflows specified using an XML-based description language and MEWSE manages the scheduling and execution of the workflow on top of a YARN cluster. MEWSE is designed with plug and play functionalities to allow the inclusion of new engines as required. MEWSE is demonstrated on Amazon EC2 with a sample workflow that includes a combination of Hadoop, Mahout, Java programs and scripts to process the data.
Year
Venue
Field
2016
CASCON
Workflow Management Coalition,Workflow technology,Software engineering,Computer science,Windows Workflow Foundation,XPDL,Workflow engine,Big data,Workflow management system,Workflow,Database
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Kiran Sundaravarathan100.34
patrick martin214818.22
D. Rope371.98
Mike McRoberts420.72
Craig Statchuk583.40