Feature Selection And Resampling In Class Imbalance Learning: Which Comes First? An Empirical Study In The Biological Domain - Citegraph

Paper Info

Title
Feature Selection And Resampling In Class Imbalance Learning: Which Comes First? An Empirical Study In The Biological Domain

Abstract
Class imbalance exists in many applications of bioinformatics and biomedicine, while dimension reduction in the feature space is often needed when building prediction models on a dataset. When the above two issues need to be considered simultaneously for skewed/imbalanced datasets, practitioners and researchers in machine learning may raise the following question: should feature selection be conducted before or after the resampling methods for combating the skewness of a dataset? While feature selection and class imbalance learning have been widely studied in the literature, little study has jointly investigated them. This paper presents a first empirical study on the performance of the two opposing pipelines for binary imbalance learning, i.e., first feature selection then resampling, or first resampling then feature selection. We carry out the study on 35 publicly available datasets belonging to the biological field, using 9 feature selection methods, 6 resampling approaches for class imbalance learning, and 3 well-known classifiers. Our experiments reveal that, there is no constant winner between the two pipelines, practitioners should test both pipelines in order to derive the best classification model for imbalance learning, in particular, the resampling before feature selection pipeline should not be neglected; but we also show that, the feature selection before resampling pipeline outperforms the other in more cases than not.

Year	DOI	Venue
2017	10.1109/BIBM.2017.8217782	2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM)
DocType	ISSN	Citations
Conference	2156-1125	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Chongsheng Zhang	1	60	3.61
Jingjun Bi	2	0	0.34
Paolo Soda	3	407	39.44

1