Improved regression tree models using generalization error-based splitting criteria

Ying Yang, Shuaian Wang, Gilbert Laporte

Research output: Contribution to journalArticlepeer-review

Abstract

Despite the widespread application of machine learning (ML) approaches such as the regression tree (RT) in the field of data-driven optimization, overfitting may impair the effectiveness of ML models and thus hinder the deployment of ML for decision-making. In particular, we address the overfitting issue of the traditional RT splitting criterion with a limited sample size, which considers only the training mean squared error, and we accurately specify the mathematical formula for the generalization error. We introduce two novel splitting criteria based on generalization error, which offer higher-quality approximations of the generalization error than the traditional training error does. One criterion is formulated through a mathematical derivation based on the RT model, and the second is established through leave-one-out cross-validation (LOOCV). We construct RT models using our proposed generalization error-based splitting criteria from extensive ML benchmark instances and report the experimental results, including the models' computational efficiency, prediction accuracy, and robustness. Our findings endorse the superior efficacy and robustness of the RT model based on the refined LOOCV-informed splitting criterion, marking substantial improvements over those of the traditional RT model. Additionally, our tree structure analysis provides insights into how our proposed LOOCV-informed splitting criterion guides the model in striking a balance between a complex tree structure and accurate predictions.

Original languageEnglish
JournalNaval Research Logistics
Early online date10 Jun 2025
DOIs
Publication statusE-pub ahead of print - 10 Jun 2025

Data Availability Statement

The data supporting the findings of this study are openly available in the Data-sets repository at https://github.com/ShadowY1998/Data-sets

Keywords

  • generalization error
  • leave-one-out cross-validation
  • mean squared error
  • regression tree

ASJC Scopus subject areas

  • Modelling and Simulation
  • Ocean Engineering
  • Management Science and Operations Research

Fingerprint

Dive into the research topics of 'Improved regression tree models using generalization error-based splitting criteria'. Together they form a unique fingerprint.

Cite this