Title | ||
---|---|---|
A distributed evolutionary multivariate discretizer for Big Data processing on Apache Spark. |
Abstract | ||
---|---|---|
Nowadays the phenomenon of Big Data is overwhelming our capacity to extract relevant knowledge through classical machine learning techniques. Discretization (as part of data reduction) is presented as a real solution to reduce this complexity. However, standard discretizers are not designed to perform well with such amounts of data. This paper proposes a distributed discretization algorithm for Big Data analytics based on evolutionary optimization. After comparing with a distributed discretizer based on the Minimum Description Length Principle, we have found that our solution yields more accurate and simpler solutions in reasonable time. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1016/j.swevo.2017.08.005 | Swarm and Evolutionary Computation |
Keywords | Field | DocType |
Discretizacion,Evolutionary computation,Big Data,Data Mining,Apache Spark | Big data processing,Discretization,Data mining,Spark (mathematics),Multivariate statistics,Computer science,Minimum description length,Artificial intelligence,Big data,Machine learning,Data reduction | Journal |
Volume | ISSN | Citations |
38 | 2210-6502 | 4 |
PageRank | References | Authors |
0.40 | 19 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sergio Ramírez-Gallego | 1 | 98 | 6.99 |
Salvador García | 2 | 4151 | 118.45 |
José Manuel Benítez | 3 | 888 | 56.02 |
Francisco Herrera | 4 | 27391 | 1168.49 |