Title
Mrs: High Performance MapReduce for Iterative and Asynchronous Algorithms in Python.
Abstract
Mrs [1] is a lightweight Python-based MapReduce implementation designed to make MapReduce programs easy to write and quick to run, particularly useful for research and academia. A common set of algorithms that would benefit from Mrs are iterative algorithms, like those frequently found in machine learning; however, iterative algorithms typically perform poorly in the MapReduce framework, meaning potentially poor performance in Mrs as well. Therefore, we propose four modifications to the original Mrs with the intent to improve its ability to perform iterative algorithms. First, we used direct task-to-task communication for most iterations and only occasionally write to a distributed file system to preserve fault tolerance. Second, we combine the reduce and map tasks which span successive iterations to eliminate unnecessary communication and scheduling latency. Third, we propose a generator-callback programming model to allow for greater flexibility in the scheduling of tasks. Finally, some iterative algorithms are naturally expressed in terms of asynchronous message passing, so we propose a fully asynchronous variant of MapReduce. We then demonstrate Mrs' enhanced performance in the context of two iterative applications: particle swarm optimization (PSO), and expectation maximization (EM).
Year
DOI
Venue
2016
10.1109/PyHPC.2016.10
PyHPC@SC
Keywords
DocType
ISBN
MapReduce, high-level parallel programming frameworks, iterative algorithms
Conference
978-1-5090-5221-9
Citations 
PageRank 
References 
0
0.34
17
Authors
4
Name
Order
Citations
PageRank
Jeffrey Lund1133.81
Chace Ashcraft230.76
Andrew W. McNabb3854.66
Kevin D. Seppi433541.46