Title
On sensitivity of case-based reasoning to optimal feature subsets in business failure prediction
Abstract
Case-based reasoning (CBR) was firstly introduced into the area of business failure prediction (BFP) in 1996. The conclusion drawn out in its first application in this area is that CBR is not more applicable than multiple discriminant analysis (MDA) and Logit. On the contrary, there are some arguments which claim that CBR with k-nearest neighbor (k-NN) as its heart is not surely outranked by those machine learning techniques. In this research, we attempt to investigate whether or not CBR is sensitive to the so-called optimal feature subsets in BFP, since feature subset is an important factor that accounts for CBR's performance. When CBR is used to solve such classification problem, the retrieval process of its life-cycle is mainly used. We use the classical Euclidean metric technique to calculate case similarity. Empirical data two years prior to failure are collected from Shanghai Stock Exchange and Shenzhen Stock Exchange in China. Four filters, i.e. MDA stepwise method, Logit stepwise method, One-way ANOVA, independent-samples t-test, and the wrapper approach of genetic algorithm are employed to generate five optimal feature subsets after data normalization. Thirty-times hold-out method is used as assessment of predictive performances by combining leave-one-out cross-validation and hold-out method. The two statistical baseline models, i.e. MDA and Logit, and the new model of support vector machine are employed as comparative models. Empirical results indicate that CBR is truly sensitive to optimal feature subsets with data for medium-term BFP. The stepwise method of MDA, a filter approach, is the first choice for CBR to select optimal feature subsets, followed by the stepwise method of Logit and the wrapper. The two filter approaches of ANOVA and t-test are the fourth choice. If MDA stepwise method is employed to select optimal feature subset for the CBR system, there are no significant difference on predictive performance of medium-term BFP between CBR and the other three models, i.e. MDA, Logit, SVM. On the contrary, CBR is outperformed by the three models at the significant level of 1%, if ANOVA or t-test is used as feature selection method for CBR.
Year
DOI
Venue
2010
10.1016/j.eswa.2009.12.034
Expert Syst. Appl.
Keywords
Field
DocType
optimal feature subsets,cbr system,case-based reasoning (cbr),feature selection method,business failure prediction,chinese listed company,filters,mda stepwise method,wrappers,business failure prediction (bfp),feature selection,filter approach,k -nearest neighbor,logit stepwise method,medium-term bfp,thirty-times hold-out method,case-based reasoning,stepwise method,predictive performance,stock exchange,case base reasoning,leave one out cross validation,genetic algorithm,machine learning,life cycle,multiple discriminant analysis,k nearest neighbor,comparative modeling,support vector machine
Logit,k-nearest neighbors algorithm,Data mining,Feature selection,Computer science,Multiple discriminant analysis,Support vector machine,Euclidean distance,Artificial intelligence,Case-based reasoning,Machine learning,Database normalization
Journal
Volume
Issue
ISSN
37
7
Expert Systems With Applications
Citations 
PageRank 
References 
6
0.42
44
Authors
4
Name
Order
Citations
PageRank
Hui Li147215.82
Hai-Bin Huang2257.59
Jie Sun337412.21
Chuang Lin43040390.74