Title
Cloud computing-based parallel genetic algorithm for gene selection in cancer classification.
Abstract
Cancer classification is one of the main steps during patient healing process. This fact enforces modern clinical researchers to use advanced bioinformatics methods for cancer classification. Cancer classification is usually performed using gene expression data gained in microarray experiment and advanced machine learning methods. Microarray experiment generates huge amount of data, and its processing via machine learning methods represents a big challenge. In this study, two-step classification paradigm which merges genetic algorithm feature selection and machine learning classifiers is utilized. Genetic algorithm is built in MapReduce programming spirit which makes this algorithm highly scalable for Hadoop cluster. In order to improve the performance of the proposed algorithm, it is extended into a parallel algorithm which process on microarray data in distributed manner using the Hadoop MapReduce framework. In this paper, the algorithm was tested on eleven GEMS data sets (9 tumors, 11 tumors, 14 tumors, brain tumor 1, lung cancer, brain tumor 2, leukemia 1, DLBCL, leukemia 2, SRBCT, and prostate tumor) and its accuracy reached 100% for less than 25 selected features. The proposed cloud computing-based MapReduce parallel genetic algorithm performed well on gene expression data. In addition, the scalability of the suggested algorithm is unlimited because of underlying Hadoop MapReduce platform. The presented results indicate that the proposed method can be effectively implemented for real-world microarray data in the cloud environment. In addition, the Hadoop MapReduce framework demonstrates substantial decrease in the computation time.
Year
DOI
Venue
2018
10.1007/s00521-016-2780-z
Neural Computing and Applications
Keywords
Field
DocType
Cancer classification, Gene expression, Hadoop, MapReduce, Parallel genetic algorithm
Data mining,Data set,Feature selection,Parallel algorithm,Computer science,Microarray analysis techniques,Artificial intelligence,Machine learning,Genetic algorithm,Computation,Cloud computing,Scalability
Journal
Volume
Issue
ISSN
30
5
1433-3058
Citations 
PageRank 
References 
2
0.36
11
Authors
3
Name
Order
Citations
PageRank
Dino Keco120.36
Abdulhamit Subasi2594.13
Jasmin Kevric31627.27