Title
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
Abstract
We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. RDDs are motivated by two types of applications that current computing frameworks handle inefficiently: iterative algorithms and interactive data mining tools. In both cases, keeping data in memory can improve performance by an order of magnitude. To achieve fault tolerance efficiently, RDDs provide a restricted form of shared memory, based on coarse-grained transformations rather than fine-grained updates to shared state. However, we show that RDDs are expressive enough to capture a wide class of computations, including recent specialized programming models for iterative jobs, such as Pregel, and new applications that these models do not capture. We have implemented RDDs in a system called Spark, which we evaluate through a variety of user applications and benchmarks.
Year
Venue
Keywords
2012
NSDI
fault-tolerant manner,fault-tolerant abstraction,interactive data mining tool,memory abstraction,shared memory,in-memory cluster computing,iterative job,shared state,fault tolerance,iterative algorithm,current computing framework,coarse-grained transformation
Field
DocType
Citations 
Abstraction,COLA (software architecture),Spark (mathematics),Shared memory,Programming paradigm,Computer science,Distributed memory,Real-time computing,Fault tolerance,Computer cluster,Distributed computing
Conference
1255
PageRank 
References 
Authors
44.75
32
9
Search Limit
1001000
Name
Order
Citations
PageRank
Matei Zaharia19101407.89
Mosharaf Chowdhury24807198.24
Tathagata Das3258097.96
Ankur Dave4191767.99
Justin Ma52314104.86
Murphy McCauley6125545.08
Michael J. Franklin7174231681.10
Scott Shenker8298922677.04
I. Stoica9214061710.11