Fused variable screening for massive imbalanced data. - Citegraph

Paper Info

Title
Fused variable screening for massive imbalanced data.

Abstract
Imbalanced data, in which the data exhibit an unequal or highly-skewed distribution between its classes/categories, are pervasive in many scientific fields, with application range from bioinformatics, text classification, face recognition, fraud detection, etc. Imbalanced data in modern science are often of massive size and high dimensionality, for example, gene expression data for diagnosing rare diseases. To address this issue, a fused screening procedure is proposed for dimension reduction with large-scale high dimensional imbalanced data under repeated case-control samplings. There are several advantages of the proposed method: it is model-free without any model specification for the underlying distribution; it is relatively inexpensive in computational cost by using the subsampling technique; it is robust to outliers in the predictors. The theoretical properties are established under regularity conditions. Numerical studies including extensive simulations and a real data example confirm that the proposed method performs well in practical settings.

Year	DOI	Venue
2020	10.1016/j.csda.2019.06.013	Computational Statistics & Data Analysis
Keywords	Field	DocType
Case-control sampling,High dimension,Imbalanced data,Model-free screening,Rank correlation	Data mining,Facial recognition system,Dimensionality reduction,Outlier,Curse of dimensionality,Specification,Statistics,Mathematics	Journal
Volume	ISSN	Citations
141	0167-9473	0
PageRank	References	Authors
0.34	0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Jinhan Xie	1	0	0.34
Meiling Hao	2	1	1.41
Wenxin Liu	3	63	11.65
Yuanyuan Lin	4	0	0.34

1