Constrain to perform: Regularization of habitat models

Björn Reineking, Boris Schröder

Research output: Contribution to journalArticlepeer-review

116 Scopus citations

Abstract

Predictive habitat models are an important tool for ecological research and conservation. A major cause of unreliable models is excessive model complexity, and regularization methods aim to improve the predictive performance by adequately constraining model complexity. We compare three regularization methods for logistic regression: variable selection, lasso, and ridge. They differ in the way model complexity is measured: variable selection uses the number of estimated parameters, the lasso uses the sum of the absolute values of the parameter estimates, and the ridge uses the sum of the squared values of the parameter estimates. We performed a simulation study with environmental data of a real landscape and artificial species occupancy data. We investigated the effect of three factors on relative model performance: (1) the number of parameters (16, 10, 6, 2) in the 'true' model that determined the distribution of the artificial species, (2) the prevalence, i.e. the proportion of sites occupied by the species, and (3) the sample size (measured in events per variable, EPV). Regularization improved model discrimination and calibration. However, no regularization method performed best under all circumstances: the ridge generally performed best in the 16-parameter scenario. The lasso generally performed best in the 10-parameter scenario. Variable selection with AIC was best at large sample sizes (EPV≥10) when less than half of the variables influenced the species distribution. However, at low sample sizes (EPV<10), ridge and lasso always performed best, regardless of the parameter scenario or prevalence. Overall, calibration was best in ridge models. Other methods showed overconfidence, particularly at low sample sizes. The percentage of correctly identified models was low for both lasso and variable selection. Variable selection should be used with caution. Although it can produce the best performing models under certain conditions, these situations are difficult to infer from the data. Ridge and lasso are risk-averse model strategies that can be expected to perform well under a wide range of underlying species-habitat relationships, particularly at small sample sizes.

Original languageEnglish
Pages (from-to)675-690
Number of pages16
JournalEcological Modelling
Volume193
Issue number3-4
DOIs
StatePublished - 15 Mar 2006
Externally publishedYes

Keywords

  • Habitat models
  • Lasso
  • Logistic regression
  • Model selection
  • Penalized maximum likelihood
  • Prediction
  • Regularization
  • Ridge
  • Variable selection

Fingerprint

Dive into the research topics of 'Constrain to perform: Regularization of habitat models'. Together they form a unique fingerprint.

Cite this