Abstract | ||
---|---|---|
Today’s one-pass analytics applications tend to be data-intensive in nature and require the ability to process high volumes of data efficiently. MapReduce is a popular programming model for processing large datasets using a cluster of machines. However, the traditional MapReduce model is not well-suited for one-pass analytics, since it is geared towards batch processing and requires the dataset to be fully loaded into the cluster before running analytical queries. This article examines, from a systems standpoint, what architectural design changes are necessary to bring the benefits of the MapReduce model to incremental one-pass analytics. Our empirical and theoretical analyses of Hadoop-based MapReduce systems show that the widely used sort-merge implementation for partitioning and parallel processing poses a fundamental barrier to incremental one-pass analytics, despite various optimizations. To address these limitations, we propose a new data analysis platform that employs hash techniques to enable fast in-memory processing, and a new frequent key based technique to extend such processing to workloads that require a large key-state space. Evaluation of our Hadoop-based prototype using real-world workloads shows that our new platform significantly improves the progress of map tasks, allows the reduce progress to keep up with the map progress, with up to 3 orders of magnitude reduction of internal data spills, and enables results to be returned continuously during the job. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1145/2389241.2389246 | ACM Trans. Database Syst. |
Keywords | DocType | Volume |
Hadoop-based MapReduce system,in-memory processing,Scalable One-Pass,internal data spill,parallel processing,traditional MapReduce model,batch processing,MapReduce model,analytics application,one-pass analytics,incremental one-pass analytics | Journal | 37 |
Issue | ISSN | Citations |
4 | 0362-5915 | 11 |
PageRank | References | Authors |
0.57 | 32 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Boduo Li | 1 | 202 | 8.65 |
Edward Mazur | 2 | 102 | 4.10 |
Yanlei Diao | 3 | 2234 | 108.95 |
Andrew Mcgregor | 4 | 1340 | 64.31 |
Prashant J. Shenoy | 5 | 6386 | 521.30 |