A novel spark-based multi-step forecasting algorithm for big data time series. - Citegraph

Paper Info

Title
A novel spark-based multi-step forecasting algorithm for big data time series.

Abstract
This paper presents different scalable methods for predicting big time series, namely time series with a high frequency measurement. Methods are also developed to deal with arbitrary prediction horizons. The Apache Spark framework is proposed for distributed computing in order to achieve the scalability of the methods. Prediction methods have been developed using Spark’s MLlib library for machine learning. Since the library does not support multivariate regression, the prediction problem is formulated as h prediction sub-problems, where h is the number of future values to predict, that is, the prediction horizon. Furthermore, different kinds of representative methods have been chosen, such as decision trees, two tree-based ensemble techniques (Gradient-Boosted and Random Forest) and a linear regression method as a reference method for comparisons. Finally, the methodology has been tested in a real time series of electrical demand in Spain, with a time interval of ten minutes between measurements.

Year	DOI	Venue
2018	10.1016/j.ins.2018.06.010	Information Sciences
Keywords	Field	DocType
Big data,Scalable,Electricity time series,Forecasting	Decision tree,Spark (mathematics),Multivariate statistics,Horizon,Algorithm,Random forest,Big data,Mathematics,Linear regression,Scalability	Journal
Volume	ISSN	Citations
467	0020-0255	3
PageRank	References	Authors
0.43	15	4

Authors (4 rows)

Cited by (3 rows)

References (15 rows)

Name	Order	Citations	PageRank
Antonio Galicia	1	18	1.70
José F. Torres	2	24	2.46
Francisco Martínez-Álvarez	3	155	23.98
Alicia Troncoso Lora	4	117	12.72

1