Title
Nonparametric variable selection and classification: The CATCH algorithm
Abstract
The problem of classifying a categorical response Y is considered in a nonparametric framework. The distribution of Y depends on a vector of predictors X, where the coordinates X"j of X may be continuous, discrete, or categorical. An algorithm is constructed to select the variables to be used for classification. For each variable X"j, an importance score s"j is computed to measure the strength of association of X"j with Y. The algorithm deletes X"j if s"j falls below a certain threshold. It is shown in Monte Carlo simulations that the algorithm has a high probability of only selecting variables associated with Y. Moreover when this variable selection rule is used for dimension reduction prior to applying classification procedures, it improves the performance of these procedures. The approach for computing importance scores is based on root Chi-square type statistics computed for randomly selected regions (tubes) of the sample space. The size and shape of the regions are adjusted iteratively and adaptively using the data to enhance the ability of the importance score to detect local relationships between the response and the predictors. These local scores are then averaged over the tubes to form a global importance score s"j for variable X"j. When confounding and spurious associations are issues, the nonparametric importance score for variable X"j is computed conditionally by using tubes to restrict the other variables. This variable selection procedure is called CATCH (Categorical Adaptive Tube Covariate Hunting). Asymptotic properties, including consistency, are established.
Year
DOI
Venue
2014
10.1016/j.csda.2013.10.024
Computational Statistics & Data Analysis
Keywords
Field
DocType
variable selection procedure,local score,catch algorithm,categorical response y,predictors x,algorithm deletes x,variable x,importance score,nonparametric importance score,nonparametric variable selection,variable selection rule,global importance score,marketing
Econometrics,Feature selection,Nonparametric statistics,Engineering,Statistics
Journal
Volume
ISSN
Citations 
72
0167-9473
0
PageRank 
References 
Authors
0.34
2
4
Name
Order
Citations
PageRank
Shijie Tang142.45
Lisha Chen2314.49
Kam-Wah Tsui3123.37
kjell a doksum400.34