Abstract | ||
---|---|---|
Shark is a research data analysis system built on a novel coarse-grained distributed shared-memory abstraction. Shark marries query processing with deep data analysis, providing a unified system for easy data manipulation using SQL and pushing sophisticated analysis closer to data. It scales to thousands of nodes in a fault-tolerant manner. Shark can answer queries 40X faster than Apache Hive and run machine learning programs 25X faster than MapReduce programs in Apache Hadoop on large datasets. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1145/2213836.2213934 | SIGMOD Conference |
Keywords | Field | DocType |
research data analysis system,unified system,apache hive,easy data manipulation,sophisticated analysis,fault-tolerant manner,deep data analysis,large datasets,fast data analysis,mapreduce program,apache hadoop,data processing,distributed shared memory,data warehouse,databases,distributed memory,machine learning,fault tolerant,data analysis,spark | SQL,Data warehouse,Data mining,Data processing,Programming language,Spark (mathematics),Abstraction,Computer science,Distributed memory,Data manipulation language,Database | Conference |
Citations | PageRank | References |
59 | 2.64 | 19 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Cliff Engle | 1 | 59 | 2.64 |
Antonio Lupher | 2 | 59 | 2.64 |
Reynold Xin | 3 | 2171 | 81.33 |
Matei Zaharia | 4 | 9101 | 407.89 |
Michael J. Franklin | 5 | 17423 | 1681.10 |
Scott Shenker | 6 | 29892 | 2677.04 |
Scott Shenker | 7 | 29892 | 2677.04 |