Title
Parallel particle swarm optimization classification algorithm variant implemented with Apache Spark
Abstract
With the rapid development of technologies such as the internet, the amount of data that are collected or generated in many areas such as in the agricultural, biomedical, and finance sectors poses challenges to the scientific community because of the volume and complexity of the data. Furthermore, the need of analysis tools that extract useful information for decision support has been receiving more attention in order for researchers to find a scalable solution to traditional algorithms. In this paper, we proposed a scalable design and implementation of a particle swarm optimization classification (SCPSO) approach that is based on the Apache Spark framework. The main idea of the SCPSO algorithm is to find the optimal centroid for each target label using particle swarm optimization and then assign unlabeled data points to the closest centroid. Two variants of SCPSO, SCPSO-F1 and SCPSO-F2, were proposed based on different fitness functions, which were tested on real data sets in order to evaluate their scalability and performance. The experimental results revealed that SCPSO-F1 and SCPSO-F2 scale very well with increasing data set sizes and the speedup of SCPSO-F2 is almost identical to the linear speedup while the speedup of SCPSO-F1 is very close to the linear speedup. Thus, SCPSO-F1 and SCPSO-F2 can be efficiently parallelized using the Apache Spark framework.
Year
DOI
Venue
2020
10.1002/cpe.5451
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
Keywords
Field
DocType
big data analytics,classification,particle swarm optimization
Particle swarm optimization,Spark (mathematics),Computer science,Parallel computing
Journal
Volume
Issue
ISSN
32.0
2.0
1532-0626
Citations 
PageRank 
References 
1
0.36
0
Authors
2
Name
Order
Citations
PageRank
Jamil Al‐Sawwa110.36
Simone A Ludwig21309179.41