Title
Feature screening for ultrahigh dimensional categorical data with covariates missing at random.
Abstract
Most existing feature screening methods assume that data are fully observed. It is quite a challenge to develop screening methods for incomplete data since the traditional missing data analysis techniques cannot be directly applied to ultrahigh dimensional case. A two-step model-free feature screening procedure for ultrahigh dimensional categorical data when some covariate values are missing at random is developed. For each covariate with missing data, the first step screens out the variables in the unspecified propensity function. In the second step, screening statistics such as the adjusted Pearson Chi-Square statistics can be calculated by leveraging the variables obtained in the first step and the special structure of categorical data. Sure screening properties are established for the proposed method. Finite sample performance is investigated by simulation studies and a real data example.
Year
DOI
Venue
2020
10.1016/j.csda.2019.106824
Computational Statistics & Data Analysis
Keywords
Field
DocType
Feature screening,Missing at random,Missing covariate,Pearson Chi-Square statistic,Sure screening property
Covariate,Categorical variable,Missing data,Statistics,Mathematics
Journal
Volume
ISSN
Citations 
142
0167-9473
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Lyu Ni100.34
Fang Fang200.68
Jun Shao321.86