SystemML: Declarative Machine Learning on Spark. - Citegraph

Paper Info

Title
SystemML: Declarative Machine Learning on Spark.

Abstract
The rising need for custom machine learning (ML) algorithms and the growing data sizes that require the exploitation of distributed, data-parallel frameworks such as MapReduce or Spark, pose significant productivity challenges to data scientists. Apache SystemML addresses these challenges through declarative ML by (1) increasing the productivity of data scientists as they are able to express custom algorithms in a familiar domain-specific language covering linear algebra primitives and statistical functions, and (2) transparently running these ML algorithms on distributed, data-parallel frameworks by applying cost-based compilation techniques to generate efficient, low-level execution plans with in-memory single-node and large-scale distributed operations. This paper describes SystemML on Apache Spark, end to end, including insights into various optimizer and runtime techniques as well as performance characteristics. We also share lessons learned from porting SystemML to Spark and declarative ML in general. Finally, SystemML is open-source, which allows the database community to leverage it as a testbed for further research.

Year	DOI	Venue
2016	10.14778/3007263.3007279	PVLDB
Field	DocType	Volume
Linear algebra,Data mining,Spark (mathematics),Programming language,Computer science,End-to-end principle,Testbed,Artificial intelligence,Porting,Machine learning,Database	Journal	9
Issue	ISSN	Citations
13	2150-8097	38
PageRank	References	Authors
0.91	27	11

Authors (11 rows)

Cited by (38 rows)

References (27 rows)

Name	Order	Citations	PageRank
Matthias Boehm	1	127	6.17
Michael Dusenberry	2	38	1.59
Deron Eriksson	3	38	0.91
Alexandre V. Evfimievski	4	501	41.76
Faraz Makari Manshadi	5	48	2.23
Niketan Pansare	6	181	7.15
Berthold Reinwald	7	901	79.37
Frederick R. Reiss	8	371	17.91
Prithviraj Sen	9	837	38.24
Arvind Surve	10	38	0.91
Shirish Tatikonda	11	640	29.87

1