Title
A distributed evolutionary multivariate discretizer for Big Data processing on Apache Spark.
Abstract
Nowadays the phenomenon of Big Data is overwhelming our capacity to extract relevant knowledge through classical machine learning techniques. Discretization (as part of data reduction) is presented as a real solution to reduce this complexity. However, standard discretizers are not designed to perform well with such amounts of data. This paper proposes a distributed discretization algorithm for Big Data analytics based on evolutionary optimization. After comparing with a distributed discretizer based on the Minimum Description Length Principle, we have found that our solution yields more accurate and simpler solutions in reasonable time.
Year
DOI
Venue
2018
10.1016/j.swevo.2017.08.005
Swarm and Evolutionary Computation
Keywords
Field
DocType
Discretizacion,Evolutionary computation,Big Data,Data Mining,Apache Spark
Big data processing,Discretization,Data mining,Spark (mathematics),Multivariate statistics,Computer science,Minimum description length,Artificial intelligence,Big data,Machine learning,Data reduction
Journal
Volume
ISSN
Citations 
38
2210-6502
4
PageRank 
References 
Authors
0.40
19
4
Name
Order
Citations
PageRank
Sergio Ramírez-Gallego1986.99
Salvador García24151118.45
José Manuel Benítez388856.02
Francisco Herrera4273911168.49