Building near-real-time processing pipelines with the spark-MPI platform - Citegraph

Paper Info

Title
Building near-real-time processing pipelines with the spark-MPI platform

Abstract
Advances in detectors and computational technologies provide new opportunities for applied research and the fundamental sciences. Concurrently, dramatic increases in the three V's (Volume, Velocity, and Variety) of experimental data and the scale of computational tasks produced the demand for new real-time processing systems at experimental facilities. Recently, this demand was addressed by the Spark-MPI approach connecting the Spark data-intensive platform with the MPI high-performance framework. In contrast with existing data management and analytics systems, Spark introduced a new middleware based on resilient distributed datasets (RDDs), which decoupled various data sources from high-level processing algorithms. The RDD middleware significantly advanced the scope of data-intensive applications, spreading from SQL queries to machine learning to graph processing. Spark-MPI further extended the Spark ecosystem with the MPI applications using the Process Management Interface. The paper explores this integrated platform within the context of online ptychographic and tomographic reconstruction pipelines.

Year	DOI	Venue
2018	10.1109/NYSDS.2017.8085039	2017 New York Scientific Data Summit (NYSDS)
Keywords	Field	DocType
streaming,high-performance,data analysis,experimental facility,Spark,MPI	SQL,Middleware,Pipeline transport,Spark (mathematics),Computer science,Real-time computing,Applied research,Analytics,Detector,Data management,Distributed computing	Journal
Volume	ISBN	Citations
abs/1805.04886	978-1-5386-3162-1	2
PageRank	References	Authors
0.52	9	7

Authors (7 rows)

Cited by (2 rows)

References (9 rows)

Name	Order	Citations	PageRank
N. Malitsky	1	2	1.53
aashish chaudhary	2	3	1.20
Sebastien Jourdain	3	38	3.47
Matt Cowan	4	2	0.86
Patrick O'Leary	5	2	0.52
Marcus D. Hanwell	6	115	9.51
Kerstin Kleese van Dam	7	9	1.64

1