Subsampling based variable selection for generalized linear models. Academic Article uri icon

Overview

abstract

  • A novel variable selection method for low-dimensional generalized linear models is introduced. The new approach called AIC OPTimization via STABility Selection (OPT-STABS) repeatedly subsamples the data, minimizes Akaike's Information Criterion (AIC) over a sequence of nested models for each subsample, and includes in the final model those predictors selected in the minimum AIC model in a large fraction of the subsamples. New methods are also introduced to establish an optimal variable selection cutoff over repeated subsamples. An extensive simulation study examining a variety of proposec variable selection methods shows that, although no single method uniformly outperforms the others in all the scenarios considered, OPT-STABS is consistently among the best-performing methods in most settings while it performs competitively for the rest. This is in contrast to other candidate methods which either have poor performance across the board or exhibit good performance in some settings, but very poor in others. In addition, the asymptotic properties of the OPT-STABS estimator are derived, and its root-n consistency and asymptotic normality are proved. The methods are applied to two datasets involving logistic and Poisson regressions.

publication date

  • March 11, 2023

Identity

PubMed Central ID

  • PMC10118238

Scopus Document Identifier

  • 85150769482

Digital Object Identifier (DOI)

  • 10.1016/j.csda.2023.107740

PubMed ID

  • 37090139

Additional Document Info

volume

  • 184