Abstract | ||
---|---|---|
This paper presents different scalable methods to predict time series of very long length such as time series with a high sampling frequency. The Apache Spark framework for distributed computing is proposed in order to achieve the scalability of the methods. Namely, the existing MLlib machine learning library from Spark has been used. Since MLlib does not support multivariate regression, the forecasting problem has been split into h forecasting subproblems, where h is the number of future values to predict. Then, representative forecasting methods of different nature have been chosen such as models based on trees, two ensembles techniques (gradient-boosted trees and random forests), and a linear regression as a reference method. Finally, the methodology has been tested on a real-world dataset from the Spanish electricity load data with a ten-minute frequency. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1007/978-3-319-59147-6_15 | ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2017, PT II |
Keywords | Field | DocType |
Big data,Scalable,Electricity time series,Forecasting | Data mining,Spark (mathematics),Electricity,Multivariate statistics,Computer science,Sampling (signal processing),Artificial intelligence,Random forest,Big data,Machine learning,Linear regression,Scalability | Conference |
Volume | ISSN | Citations |
10306 | 0302-9743 | 6 |
PageRank | References | Authors |
0.43 | 15 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Antonio Galicia | 1 | 18 | 1.70 |
José F. Torres | 2 | 24 | 2.46 |
Francisco Martínez-Álvarez | 3 | 155 | 23.98 |
Alicia Troncoso | 4 | 153 | 20.88 |