Skip to main navigation Skip to search Skip to main content

Greedy knot selection algorithm for restricted cubic spline regression

  • UIT The Arctic University of Norway

Research output: Contribution to journalArticlepeer-review

30 Scopus citations

Abstract

Non-linear regression modeling is common in epidemiology for prediction purposes or estimating relationships between predictor and response variables. Restricted cubic spline (RCS) regression is one such method, for example, highly relevant to Cox proportional hazard regression model analysis. RCS regression uses third-order polynomials joined at knot points to model non-linear relationships. The standard approach is to place knots by a regular sequence of quantiles between the outer boundaries. A regression curve can easily be fitted to the sample using a relatively high number of knots. The problem is then overfitting, where a regression model has a good fit to the given sample but does not generalize well to other samples. A low knot count is thus preferred. However, the standard knot selection process can lead to underperformance in the sparser regions of the predictor variable, especially when using a low number of knots. It can also lead to overfitting in the denser regions. We present a simple greedy search algorithm using a backward method for knot selection that shows reduced prediction error and Bayesian information criterion scores compared to the standard knot selection process in simulation experiments. We have implemented the algorithm as part of an open-source R-package, knutar.

Original languageEnglish
Article number1283705
JournalFrontiers in Epidemiology
Volume3
DOIs
StatePublished - 2023

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • algorithm
  • model selection
  • non-linear regression
  • prediction
  • restricted cubic splines

Fingerprint

Dive into the research topics of 'Greedy knot selection algorithm for restricted cubic spline regression'. Together they form a unique fingerprint.

Cite this