Title
Classification and feature gene selection using the normalized maximum likelihood model for discrete regression
Abstract
This paper studies the problem of class discrimination based on the normalized maximum likelihood (NML) model for a nonlinear regression, where the nonlinearly transformed class labels, each taking M possible values, are assumed to be drawn from a multinomial trial process. The strength of the MDL methods in statistical inference is to find the model structure which, in this particular classification problem, amounts to finding the best set of feature genes. We first show that the minimization of the codelength of the NML model for different sets of feature genes is a tractable problem. We then extend the model for selecting the feature genes to a completely defined classifier and check its classification error in a cross-validation experiment. Also the quantization process itself involved in getting the required entries in the model, can be evaluated with the NML description length. The new classification method is applied to leukemia class discrimination based on gene expression microarray data. We find classification errors as low as 0.03% with a quadruplet of binary qnantized genes, which was top ranked by the NML description length. Such a length of the class labels, obtained with various sets of feature genes in the nonlinear regression model, allows intuitive comparisons of nested feature sets.
Year
DOI
Venue
2003
10.1016/S0165-1684(02)00470-X
Signal Processing
Keywords
Field
DocType
nested feature set,feature gene,model structure,feature gene selection,class label,minimum description length,discrete regression,gene expression,feature selection,class discrimination,classification,normalized maximum likelihood,normalized maximum likelihood model,classification error,nonlinear regression model,new classification method,nml description length,nml model,nonlinear regression,microarray data,difference set,likelihood,statistical inference,cross validation,gene selection
Pattern recognition,Feature selection,Regression analysis,Minimum description length,Multinomial distribution,Nonlinear regression,Feature extraction,Artificial intelligence,Statistical inference,Class discrimination,Mathematics
Journal
Volume
Issue
ISSN
83
4
Signal Processing
Citations 
PageRank 
References 
10
1.23
7
Authors
3
Name
Order
Citations
PageRank
Ioan Tabus127638.23
Jorma Rissanen21665798.14
Jaakko Astola31515230.41