Title
A reliable method for colorectal cancer prediction based on feature selection and support vector machine.
Abstract
Colorectal cancer (CRC) is a common cancer responsible for approximately 600,000 deaths per year worldwide. Thus, it is very important to find the related factors and detect the cancer accurately. However, timely and accurate prediction of the disease is challenging. In this study, we build an integrated model based on logistic regression (LR) and support vector machine (SVM) to classify the CRC into cancer and normal samples. From various factors, human location, age, gender, BMI, and cancer tumor type, tumor grade, and DNA, of the cancer, we select the most significant factors (p < 0.05) using logistic regression as main features, and with these features, a grid-search SVM model is designed using different kernel types (Linear, radial basis function (RBF), Sigmoid, and Polynomial). The result of the logistic regression indicates that the Firmicutes (AUC 0.918), Bacteroidetes (AUC 0.856), body mass index (BMI) (AUC 0.777), and age (AUC 0.710) and their combined factors (AUC 0.942) are effective for CRC detection. And the best kernel type is RBF, which achieves an accuracy of 90.1% when k = 5, and 91.2% when k = 10. This study provides a new method for colorectal cancer prediction based on independent risky factors. Graphical abstract Flow chart depicting the method adopted in the study. LR (logistic regression) and ROC curve are used to select independent features as input of SVM. SVM kernel selection aims to find the best kernel function for classification by comparing Linear, RBF, Sigmoid, and Polynomial kernel types of SVM, and the result shows the best kernel is RBF. Classification performance of LR + RF, LR + NB, LR + KNN, and LR + ANNs models are compared with LR + SVM. After these steps, the cancer and healthy individuals can be classified, and the best model is selected.
Year
DOI
Venue
2019
10.1007/s11517-018-1930-0
Medical & biological engineering & computing
Keywords
Field
DocType
Colorectal cancer,Logistic regression,Support vector machine,Microbiome
Kernel (linear algebra),Computer vision,Feature selection,Body mass index,Support vector machine,Artificial intelligence,Statistics,Colorectal cancer,Logistic regression,Cancer,Mathematics,Sigmoid function
Journal
Volume
Issue
ISSN
57
4
1741-0444
Citations 
PageRank 
References 
1
0.36
14
Authors
6
Name
Order
Citations
PageRank
Dan Zhao117224.34
Hong Liu2655.30
Yuanjie Zheng367155.01
Yanlin He421.38
Dianjie Lu55210.88
Chen Lyu6366.35