Big Data Analytics with Spark - Citegraph

Paper Info

Title
Big Data Analytics with Spark

Abstract
Born from a Berkeley graduate project, the Apache Spark library has grown to be the most broadly used big data analytics platform. While Spark integrates with the older Hadoop ecosystem, it provides much more intuitive, faster, and powerful abstractions for manipulating distributed data than MapReduce. In this workshop, we will cover the basics of the Spark library with the goal of getting participants up to speed so that they can use the library or teach it in courses that involve big data or distributed processing. Participants will work with examples that range from calculating basic summary statistics to using the Spark Machine Learning library for performing sophisticated machine learning analyses on large datasets. Tasks during the session will be performed on smaller samples using the Spark local standalone implementation on participant's laptops. We will also discuss how Spark can be run on a local or cloud-based cluster and point participants toward resources for setting up those environments for their students.

Year	DOI	Venue
2020	10.1145/3287324.3287551	Proceedings of the 50th ACM Technical Symposium on Computer Science Education
Keywords	Field	DocType
big data, data science, distributed computing, spark	Abstraction,Spark (mathematics),Computer science,Multimedia,Big data,Cloud computing	Conference
ISBN	Citations	PageRank
978-1-4503-5890-3	1	0.39
References	Authors
0	1

Authors (1 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Mark C. Lewis	1	24	5.04

1