Title
Predicting problem difficulty for genetic programming applied to data classification
Abstract
During the development of applied systems, an important problem that must be addressed is that of choosing the correct tools for a given domain or scenario. This general task has been addressed by the genetic programming (GP) community by attempting to determine the intrinsic difficulty that a problem poses for a GP search. This paper presents an approach to predict the performance of GP applied to data classification, one of the most common problems in computer science. The novelty of the proposal is to extract statistical descriptors and complexity descriptors of the problem data, and from these estimate the expected performance of a GP classifier. We derive two types of predictive models: linear regression models and symbolic regression models evolved with GP. The experimental results show that both approaches provide good estimates of classifier performance, using synthetic and real-world problems for validation. In conclusion, this paper shows that it is possible to accurately predict the expected performance of a GP classifier using a set of descriptors that characterize the problem data.
Year
DOI
Venue
2011
10.1145/2001576.2001759
GECCO
Keywords
Field
DocType
data classification,expected performance,complexity descriptors,important problem,predicting problem difficulty,common problem,genetic programming,problem data,gp classifier,real-world problem,classifier performance,gp search,prediction model,linear regression model,classification
Data mining,Mathematical optimization,Computer science,Genetic programming,Artificial intelligence,Novelty,Data classification,Classifier (linguistics),Performance prediction,Symbolic regression,Machine learning,Linear regression
Conference
Citations 
PageRank 
References 
9
0.56
21
Authors
4
Name
Order
Citations
PageRank
Leonardo Trujillo144438.12
Yuliana Martínez2425.70
Edgar Galván-López324613.34
Pierrick Legrand49016.20