A comparison of approaches to large-scale data analysis - Citegraph

Paper Info

Title
A comparison of approaches to large-scale data analysis

Abstract
There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis [17]. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model [8, 17]. In this paper, we describe and compare both paradigms. Furthermore, we evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a benchmark consisting of a collection of tasks that we have run on an open source version of MR as well as on two parallel DBMSs. For each task, we measure each system's performance for various degrees of parallelism on a cluster of 100 nodes. Our results reveal some interesting trade-offs. Although the process to load data into and tune the execution of parallel DBMSs took much longer than the MR system, the observed performance of these DBMSs was strikingly better. We speculate about the causes of the dramatic performance difference and consider implementation concepts that future systems should take from both kinds of architectures.

Year	DOI	Venue
2009	10.1145/1559845.1559865	SIGMOD Conference
Keywords	Field	DocType
observed performance,parallel sql database management,dramatic performance difference,basic control flow,large-scale data analysis,mr system,parallel dbmss,development complexity,future system,considerable enthusiasm,database management system,computer model,data analysis,control flow	Data mining,Parallel database,Computer science,Parallel computing,Control flow,Sql database,Management system,Database	Conference
Citations	PageRank	References
515	62.74	12
Authors
7

Search Limit

100515

Authors (7 rows)

Cited by (100 rows)

References (12 rows)

Name	Order	Citations	PageRank
Andrew Pavlo	1	1614	122.03
Erik Paulson	2	814	81.47
Alexander Rasin	3	2950	209.48
Daniel J. Abadi	4	6163	367.24
David J. DeWitt	5	12943	3559.25
Samuel Madden	6	16101	1176.38
Michael Stonebraker	7	12463	4310.17

1