Title
Stability Based Sparse LSI/PCA: Incorporating Feature Selection in LSI and PCA
Abstract
The stability of sample based algorithms is a concept commonly used for parameter tuning and validity assessment. In this paper we focus on two well studied algorithms, LSI and PCA, and propose a feature selection process that provably guarantees the stability of their outputs. The feature selection process is performed such that the level of (statistical) accuracy of the LSI/PCA input matrices is adequate for computing meaningful (stable) eigenvectors. The feature selection process "sparsifies" LSI/PCA, resulting in the projection of the instances on the eigenvectors of a principal submatrix of the original input matrix, thus producing sparse factor loadings that are linear combinations solely of the selected features. We utilize bootstrapping confidence intervals for assessing the statistical accuracy of the input sample matrices, and matrix perturbation theory in order to relate the statistical accuracy to the stability of eigenvectors. Experiments on several UCI-datasets verify empirically our approach.
Year
DOI
Venue
2007
10.1007/978-3-540-74958-5_23
ECML
Keywords
Field
DocType
incorporating feature selection,sparse lsi,linear combination,matrix perturbation theory,input sample matrix,original input matrix,feature selection process,pca input matrix,statistical accuracy,bootstrapping confidence interval,selected feature,parameter tuning,confidence interval,feature selection,perturbation theory,eigenvectors
Linear combination,Feature selection,Pattern recognition,Matrix (mathematics),Computer science,Bootstrapping,Bootstrapping (statistics),Artificial intelligence,Factor analysis,Eigenvalues and eigenvectors,Principal component analysis
Conference
Volume
ISSN
Citations 
4701
0302-9743
5
PageRank 
References 
Authors
0.47
12
2
Name
Order
Citations
PageRank
Dimitrios Mavroeidis11309.50
Michalis Vazirgiannis23942268.00