Title
On The Use Of Random Discretization And Dimensionality Reduction In Ensembles For Big Data
Abstract
Massive data growth in recent years has made data reduction techniques to gain a special popularity because of their ability to reduce this enormous amount of data, also called Big Data. Random Projection Random Discretization is an innovative ensemble method. It uses two data reduction techniques to create more informative data, their proposed Random Discretization, and Random Projections (RP). However, RP has some shortcomings that can be solved by more powerful methods such as Principal Components Analysis (PCA). Aiming to tackle this problem, we propose a new ensemble method using the Apache Spark framework and PCA for dimensionality reduction, named Random Discretization Dimensionality Reduction Ensemble. In our experiments on five Big Data datasets, we show that our proposal achieves better prediction performance than the original algorithm and Random Forest.
Year
DOI
Venue
2018
10.1007/978-3-319-92639-1_2
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS (HAIS 2018)
Keywords
Field
DocType
Big Data, Ensemble, Discretization, Apache Spark, PCA, Data reduction
Random projection,Discretization,Data mining,Dimensionality reduction,Spark (mathematics),Pattern recognition,Computer science,Artificial intelligence,Random forest,Big data,Principal component analysis,Data reduction
Conference
Volume
ISSN
Citations 
10870
0302-9743
0
PageRank 
References 
Authors
0.34
8
4
Name
Order
Citations
PageRank
Diego García-Gil1192.69
Sergio Ramírez-Gallego2986.99
Salvador García34151118.45
Francisco Herrera4273911168.49