Title
On the Usage of the Probability Integral Transform to Reduce the Complexity of Multi-Way Fuzzy Decision Trees in Big Data Classification Problems
Abstract
We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.
Year
DOI
Venue
2018
10.1109/BigDataCongress.2018.00011
2018 IEEE International Congress on Big Data (BigData Congress)
Keywords
Field
DocType
Fuzzy Decision Trees,Probability Integral Transform,Quantile Function,MapReduce,Apache Spark,Big Data
Decision tree,Data mining,Computer science,Uniform distribution (continuous),Algorithm,Quantile function,Fuzzy set,Cumulative distribution function,Big data,Fuzzy decision tree,Probability integral transform
Conference
ISSN
ISBN
Citations 
2379-7703
978-1-5386-7233-4
0
PageRank 
References 
Authors
0.34
10
4
Name
Order
Citations
PageRank
Mikel Elkano1897.82
Mikel Uriz200.34
Humberto Bustince31938134.10
Mikel Galar4100340.90