Abstract | ||
---|---|---|
In this era of Big Data, organizations are attempting to gain new insights from the data by performing more complex analyses often involving multiple data sets and multiple analytic techniques. Creating a workflow to perform theses analyses can involve several analytic engines and require multiple transformations of the data as it flows through the steps of the task. Coordinating the jobs involved in a workflow is handled by a workflow management system. Current workflow systems, however, typically run only on one engine and do not offer the versatility required by analytic workflows. On the other hand, the process of submitting the jobs on different engines manually can be time consuming and requires the expertise of working with the various analytic engines. In this paper we present MEWSE - Multi Engine Workflow Submission and Execution on Apache YARN. Users submit multi-engine analytic workflows specified using an XML-based description language and MEWSE manages the scheduling and execution of the workflow on top of a YARN cluster. MEWSE is designed with plug and play functionalities to allow the inclusion of new engines as required. MEWSE is demonstrated on Amazon EC2 with a sample workflow that includes a combination of Hadoop, Mahout, Java programs and scripts to process the data. |
Year | Venue | Field |
---|---|---|
2016 | CASCON | Workflow Management Coalition,Workflow technology,Software engineering,Computer science,Windows Workflow Foundation,XPDL,Workflow engine,Big data,Workflow management system,Workflow,Database |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kiran Sundaravarathan | 1 | 0 | 0.34 |
patrick martin | 2 | 148 | 18.22 |
D. Rope | 3 | 7 | 1.98 |
Mike McRoberts | 4 | 2 | 0.72 |
Craig Statchuk | 5 | 8 | 3.40 |