Dynamic Grammar Pruning for Program Size Reduction in Symbolic Regression.

Muhammad Sarmad Ali, Meghana Kshirsagar, Enrique Naredo, Conor Ryan
Author Information
  1. Muhammad Sarmad Ali: Castletroy, Limerick, V94 T9PX Ireland Department of Computer Science and Information Systems, University of Limerick. ORCID
  2. Meghana Kshirsagar: Castletroy, Limerick, V94 T9PX Ireland Department of Computer Science and Information Systems, University of Limerick. ORCID
  3. Enrique Naredo: Castletroy, Limerick, V94 T9PX Ireland Department of Computer Science and Information Systems, University of Limerick. ORCID
  4. Conor Ryan: Castletroy, Limerick, V94 T9PX Ireland Department of Computer Science and Information Systems, University of Limerick. ORCID

Abstract

Grammar is a key input in grammar-based genetic programming. Grammar design not only influences performance, but also program size. However, grammar design and the choice of productions often require expert input as no automatic approach exists. This research work discusses our approach to automatically reduce a bloated grammar. By utilizing a simple Production Ranking mechanism, we identify productions which are less useful and dynamically prune those to channel evolutionary search towards better (smaller) solutions. Our objective in this work was program size reduction without compromising generalization performance. We tested our approach on 13 standard symbolic regression datasets with Grammatical Evolution. Using a grammar embodying a well-defined function set as a baseline, we compare effective genome length and test performance with our approach. Dynamic grammar pruning achieved significantly better genome lengths for all datasets, while significantly improving generalization performance on three datasets, although it worsened in five datasets. When we utilized linear scaling during the production ranking stages (the first 20 generations) the results dramatically improved. Not only were the programs smaller in all datasets, but generalization scores were also significantly better than the baseline in 6 out of 13 datasets, and comparable in the rest. When the baseline was also linearly scaled as well, the program size was still smaller with the Production Ranking approach, while generalization scores dropped in only three datasets without any significant compromise in the rest.

Keywords

References

  1. IEEE Trans Cybern. 2020 Feb;50(2):476-488 [PMID: 30418894]

Word Cloud

Created with Highcharts 10.0.0datasetsapproachGrammarperformancegrammargeneralizationalsoprogramsizeProductionbettersmallerbaselinegenomesignificantlyinputdesignproductionsworkRankingwithout13GrammaticallengthDynamicpruningthreerankingscoresrestkeygrammar-basedgeneticprogramminginfluencesHoweverchoiceoftenrequireexpertautomaticexistsresearchdiscussesautomaticallyreducebloatedutilizingsimplemechanismidentifylessusefuldynamicallyprunechannelevolutionarysearchtowardssolutionsobjectivereductioncompromisingtestedstandardsymbolicregressionEvolutionUsingembodyingwell-definedfunctionsetcompareeffectivetestachievedlengthsimprovingalthoughworsenedfiveutilizedlinearscalingproductionstagesfirst20generationsresultsdramaticallyimprovedprograms6comparablelinearlyscaledwellstilldroppedsignificantcompromisePruningProgramSizeReductionSymbolicRegressionEffectiveevolution

Similar Articles

Cited By (1)