Title
Feature engineering for improving robustness of crossover in symbolic regression
Abstract
Isolating the fitness-contribution of substructures is typically a difficult task in Genetic Programming (GP). Hence, useful substructures are lost when the overall structure (model) performs poorly. Furthermore, while crossover is heavily used in GP, it typically produces offspring models with significantly lower fitness than that of the parents. In symbolic regression, this degradation also occurs because the coefficients of an evolving model lose utility after crossover. This paper proposes isolating the fitness-contribution of various substructures and reducing the negative impact of crossover by evolving a set of features instead of monolithic models. The method then leverages multiple linear regression (MLR) to optimise the coefficients of these features. Since adding new features cannot degrade the accuracy of an MLR produced model, MLR-aided GP models can bloat. To penalise such additions, we use Adjusted R2 as the fitness function. The paper compares the proposed method with standard GP and GP with linear scaling. Experimental results show that the proposed method matches the accuracy of the competing methods within only 1/10th of the number of generations. Also, the method significantly decreases the rate of post-crossover fitness degradation.
Year
DOI
Venue
2020
10.1145/3377929.3390078
GECCO '20: Genetic and Evolutionary Computation Conference Cancún Mexico July, 2020
DocType
ISBN
Citations 
Conference
978-1-4503-7127-8
0
PageRank 
References 
Authors
0.34
0
5