Title
A novel spark-based multi-step forecasting algorithm for big data time series.
Abstract
This paper presents different scalable methods for predicting big time series, namely time series with a high frequency measurement. Methods are also developed to deal with arbitrary prediction horizons. The Apache Spark framework is proposed for distributed computing in order to achieve the scalability of the methods. Prediction methods have been developed using Spark’s MLlib library for machine learning. Since the library does not support multivariate regression, the prediction problem is formulated as h prediction sub-problems, where h is the number of future values to predict, that is, the prediction horizon. Furthermore, different kinds of representative methods have been chosen, such as decision trees, two tree-based ensemble techniques (Gradient-Boosted and Random Forest) and a linear regression method as a reference method for comparisons. Finally, the methodology has been tested in a real time series of electrical demand in Spain, with a time interval of ten minutes between measurements.
Year
DOI
Venue
2018
10.1016/j.ins.2018.06.010
Information Sciences
Keywords
Field
DocType
Big data,Scalable,Electricity time series,Forecasting
Decision tree,Spark (mathematics),Multivariate statistics,Horizon,Algorithm,Random forest,Big data,Mathematics,Linear regression,Scalability
Journal
Volume
ISSN
Citations 
467
0020-0255
3
PageRank 
References 
Authors
0.43
15
4
Name
Order
Citations
PageRank
Antonio Galicia1181.70
José F. Torres2242.46
Francisco Martínez-Álvarez315523.98
Alicia Troncoso Lora411712.72