Title
A Dynamic Spark-based Classification Framework for Imbalanced Big Data.
Abstract
Classification of imbalanced big data has assembled an extensive consideration by many researchers during the last decade. Standard classification methods poorly diagnosis the minority class samples. Several approaches have been introduced for solving the problem of class imbalance in big data to enhance the generalization in classification. However, most of these approaches neglect the effect of border samples on classification performance; the high impact border samples might expose to misclassification. In this paper, a Spark Based Mining Framework (SBMF) is proposed to address the imbalanced data problem. Two main modules are designed for this purpose. The first is the Border Handling Module (BHM) which under samples the low impact majority border instances and oversamples the minority class instances. The second module is the Selective Border Instances sampling (SBI) Module, which enhances the output of the BHM module. The performance of the SBMF framework is evaluated and compared with other recent systems. A number of experiments were performed using moderate and big datasets with different imbalanced ratio. The results obtained from SBMF framework, when compared to the recent works, show better performance for the different datasets and classifiers.
Year
DOI
Venue
2018
10.1007/s10723-018-9465-z
J. Grid Comput.
Keywords
Field
DocType
Big data, Classification, Spark, Sampling methods, Borders, Imbalanced datasets, SMOTE
Data mining,Spark (mathematics),Computer science,Sampling (statistics),Big data,Distributed computing
Journal
Volume
Issue
ISSN
16
4
1570-7873
Citations 
PageRank 
References 
2
0.39
15
Authors
4
Name
Order
Citations
PageRank
Nahla B. Abdel-Hamid190.91
Sally M. El-Ghamrawy2154.29
Ali I. Eldesouky3366.97
Hesham Arafat4132.58