Title
Shark: fast data analysis using coarse-grained distributed memory
Abstract
Shark is a research data analysis system built on a novel coarse-grained distributed shared-memory abstraction. Shark marries query processing with deep data analysis, providing a unified system for easy data manipulation using SQL and pushing sophisticated analysis closer to data. It scales to thousands of nodes in a fault-tolerant manner. Shark can answer queries 40X faster than Apache Hive and run machine learning programs 25X faster than MapReduce programs in Apache Hadoop on large datasets.
Year
DOI
Venue
2012
10.1145/2213836.2213934
SIGMOD Conference
Keywords
Field
DocType
research data analysis system,unified system,apache hive,easy data manipulation,sophisticated analysis,fault-tolerant manner,deep data analysis,large datasets,fast data analysis,mapreduce program,apache hadoop,data processing,distributed shared memory,data warehouse,databases,distributed memory,machine learning,fault tolerant,data analysis,spark
SQL,Data warehouse,Data mining,Data processing,Programming language,Spark (mathematics),Abstraction,Computer science,Distributed memory,Data manipulation language,Database
Conference
Citations 
PageRank 
References 
59
2.64
19
Authors
7
Name
Order
Citations
PageRank
Cliff Engle1592.64
Antonio Lupher2592.64
Reynold Xin3217181.33
Matei Zaharia49101407.89
Michael J. Franklin5174231681.10
Scott Shenker6298922677.04
Scott Shenker7298922677.04