Title
Optimality of graphlet screening in high dimensional variable selection
Abstract
Consider a linear model Y = Xβ+ωz, where X has n rows and p columns and z - N(0, In). We assume both p and n are large, including the case of p ≫ n. The unknown signal vector β is assumed to be sparse in the sense that only a small fraction of its components is nonzero. The goal is to identify such nonzero coordinates (i.e., variable selection). We are primarily interested in the regime where signals are both rare and weak so that successful variable selection is challenging but is still possible. We assume the Gram matrix G = X′X is sparse in the sense that each row has relatively few large entries (diagonals of G are normalized to 1). The sparsity of G naturally induces the sparsity of the so-called Graph of Strong Dependence (GOSD). The key insight is that there is an interesting interplay between the signal sparsity and graph sparsity: in a broad context, the signals decompose into many small-size components of GOSD that are disconnected to each other. We propose Graphlet Screening for variable selection. This is a two-step Screen and Clean procedure, where in the first step, we screen subgraphs of GOSD with sequential χ2-tests, and in the second step, we clean with penalized MLE. The main methodological innovation is to use GOSD to guide both the screening and cleaning processes. For any variable selection procedure β, we measure its performance by the Hamming distance between the sign vectors of β and β, and assess the optimality by the minimax Hamming distance. Compared with more stringent criteria such as exact support recovery or oracle property, which demand strong signals, the Hamming distance criterion is more appropriate for weak signals since it naturally allows a small fraction of errors. We show that in a broad class of situations, Graphlet Screening achieves the optimal rate of convergence in terms of the Hamming distance. Unlike Graphlet Screening, well-known procedures such as the L0/L1-penalization methods do not utilize local graphic structure for variable selection, so they generally do not achieve the optimal rate of convergence, even in very simple settings and even if the tuning parameters are ideally set. The the presented algorithm is implemented as R-CRAN package ScreenClean and in matlab (available at http://www.stat.cmu.edu/~jiashun/Research/software/GS-matlab/).
Year
DOI
Venue
2014
10.5555/2627435.2697054
Journal of Machine Learning Research
Keywords
Field
DocType
graphlet screening,screen and clean,phase diagram,graph of least favorables,graph of strong dependence,hamming distance,asymptotic minimaxity,sparsity,rare and weak signal model
Diagonal,Row,Minimax,Feature selection,Linear model,Hamming distance,Rate of convergence,Artificial intelligence,Gramian matrix,Mathematics,Machine learning
Journal
Volume
Issue
ISSN
15
1
1532-4435
Citations 
PageRank 
References 
1
0.36
10
Authors
3
Name
Order
Citations
PageRank
Jiashun Jin11147.75
Cun-Hui Zhang217418.38
Q Zhang310.69