Type: Package
Title: Augmented Backward Elimination
Version: 5.1.2
Date: 2025-4-1
Author: Rok Blagus [aut, cre], Sladana Babic [ctb], Daniela Dunkler [ctb], Georg Heinze [ctb], Gregor Steiner [ctb]
Maintainer: Rok Blagus <rok.blagus@mf.uni-lj.si>
Description: Performs augmented backward elimination and checks the stability of the obtained model. Augmented backward elimination combines significance or information based criteria with the change in estimate to either select the optimal model for prediction purposes or to serve as a tool to obtain a practically sound, highly interpretable model. More details can be found in Dunkler et al. (2014) <doi:10.1371/journal.pone.0113677>.
License: GPL-3
Depends: R (≥ 4.1.0)
RoxygenNote: 7.3.2
Encoding: UTF-8
Imports: ggplot2 (≥ 3.4.0), reshape2 (≥ 1.4.0), tidytext (≥ 0.4.0), survival (≥ 3.4-0), foreach (≥ 1.5.0), lifecycle (≥ 1.0.0)
Suggests: testthat (≥ 3.0.0)
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2025-04-03 10:28:31 UTC; rblagus
Repository: CRAN
Date/Publication: 2025-04-03 10:50:06 UTC

abe: Augmented Backward Elimination

Description

Performs augmented backward elimination and checks the stability of the obtained model. Augmented backward elimination combines significance or information based criteria with the change in estimate to either select the optimal model for prediction purposes or to serve as a tool to obtain a practically sound, highly interpretable model. More details can be found in Dunkler et al. (2014) doi:10.1371/journal.pone.0113677.

Author(s)

Maintainer: Rok Blagus rok.blagus@mf.uni-lj.si

Other contributors:


Augmented Backward Elimination

Description

Function 'abe' performs Augmented Backward Elimination where variable selection is based on the change-in-estimate and significance or information criteria as presented in [Dunkler et al. (2014)](doi:10.1371/journal.pone.0113677). It can also make a backward elimination based on significance or information criteria only by turning off the change-in-estimate criterion.

Usage

abe(
  fit,
  data = NULL,
  include = NULL,
  active = NULL,
  tau = 0.05,
  exact = FALSE,
  criterion = c("alpha", "AIC", "BIC"),
  alpha = 0.2,
  type.test = c("Chisq", "F", "Rao", "LRT"),
  type.factor = NULL,
  verbose = TRUE,
  ...
)

Arguments

fit

An object of class '"lm"', '"glm"', '"logistf"', '"coxph"', or '"survreg"' representing the fit. Note, the functions should be fitted with argument 'x=TRUE' and 'y=TRUE' (or 'model=TRUE' for '"logistf"' objects).

data

data frame used when fitting the object 'fit'.

include

a vector containing the names of variables that will be included in the final model. These variables are used as only passive variables during modeling. *These variables might be exposure variables of interest or known confounders.* They will never be dropped from the working model in the selection process, but they will be used passively in evaluating change-in-estimate criteria of other variables. Note, variables which are not specified as include or active in the model fit are assumed to be active and passive variables.

active

a vector containing the names of active variables. These *less important explanatory variables* will only be used as active, but not as passive variables when evaluating the change-in-estimate criterion.

tau

Value that specifies the threshold of the relative change-in-estimate criterion. Default is set to 0.05.

exact

Logical, specifies if the method will use exact change-in-estimate or its approximation. Default is set to FALSE, which means that the method will use the approximation proposed by Dunkler et al. (2014). Note, setting to TRUE can severely slow down the algorithm, but setting to FALSE can in some cases, i.e., if dummy variables of a factor are evaluated together, lead to a poor approximation of the change-in-estimate criterion. See details.

criterion

String that specifies the strategy to select variables for the black list. Currently supported options are significance level ''alpha'‘, Akaike information criterion '’AIC'‘ and Bayesian information criterion '’BIC''. If you are using significance level, you have to specify the value of 'alpha' (see parameter 'alpha') and the type of the test statistic (see parameter 'type.test'). Default is set to '"alpha"'.

alpha

Value that specifies the level of significance as explained above. Default is set to 0.2.

type.test

String that specifies which test should be performed in case the 'criterion = "alpha"'. Possible values are '"F"' and '"Chisq"' (default) for class '"lm"', '"Rao"', '"LRT"', '"Chisq"' (default), '"F"' for class '"glm"' and '"Chisq"' for class '"coxph"'. See also drop1.

type.factor

String that specifies how to treat factors, see details, possible values are '"factor"' and '"individual"'.

verbose

Logical that specifies if the variable selection process should be printed. This can severely slow down the algorithm. Default is set to TRUE.

...

Further arguments. Currently, this is primarily used to warn users about arguments that are no longer supported.

Details

Using the default settings 'abe' will perform augmented backward elimination based on significance. The level of significance will be set to 0.2. All variables will be treated as "passive or active". Approximated change-in-estimate will be used. Threshold of the relative change-in-estimate criterion will be 0.05. Setting tau to a very large number (e.g. 'Inf') turns off the change-in-estimate criterion, and ABE will only perform backward elimination. Specifying '"alpha" = 0' will include variables only because of the change-in-estimate criterion, as then variables are not safe from exclusion because of their p-values. Specifying '"alpha" = 1' will always include all variables.

When using 'type.factor="individual"' each dummy variable of a factor is treated as an individual explanatory variable, hence only this dummy variable can be removed from the model. Use sensible coding for the reference group. Using 'type.factor="factor"' will look at the significance of removing all dummy variables of the factor and can drop the entire variable from the model. If 'type.factor="factor"' then 'exact' should be set to 'TRUE' to avoid poor approximations.

In earlier versions, abe used to include an exp.beta argument. This is not supported anymore. Instead, the function now uses the exponential change-in-estimate for logistic, Cox, and parametric survival models only.

Value

An object of class '"lm"', '"glm"', '"coxph"', or '"survreg"' representing the model chosen by abe method.

Author(s)

Rok Blagus, rok.blagus@mf.uni-lj.si

Daniela Dunkler

Gregor Steiner

Sladana Babic

References

Daniela Dunkler, Max Plischke, Karen Lefondre, and Georg Heinze. Augmented Backward Elimination: A Pragmatic and Purposeful Way to Develop Statistical Models. PloS One, 9(11):e113677, 2014, [doi:](doi:10.1371/journal.pone.0113677).

See Also

abe.resampling, lm, glm and coxph

Examples

# simulate some data:

set.seed(1)
n = 100
x1 <- runif(n)
x2 <- runif(n)
x3 <- runif(n)
y <- -5 + 5 * x1 + 5 * x2 + rnorm(n, sd = 5)
dd <- data.frame(y, x1, x2, x3)

# fit a simple model containing all variables
fit1 <- lm(y ~ x1 + x2 + x3, x = TRUE, y = TRUE, data = dd)

# perform ABE with "x1" as only passive and "x2" as only active
# using the exact change in the estimate of 5% and significance
# using 0.2 as a threshold
abe.fit <- abe(fit1, data = dd, include = "x1", active = "x2",
tau = 0.05, exact = TRUE, criterion = "alpha", alpha = 0.2,
type.test = "Chisq", verbose = TRUE)

summary(abe.fit)

# similar example, but turn off the change-in-estimate and perform
# only backward elimination

be.fit <- abe(fit1, data = dd, include = "x1", active = "x2",
tau = Inf, exact = TRUE, criterion = "alpha", alpha = 0.2,
type.test = "Chisq", verbose = TRUE)

summary(be.fit)

# an example with the model containing categorical covariates:
dd$x4 <- rbinom(n, size = 3, prob = 1/3)
dd$y1 <- -5 + 5 * x1 + 5 * x2 + rnorm(n, sd = 5)
fit2 <- lm(y1 ~ x1 + x2 + factor(x4), x = TRUE, y = TRUE, data = dd)

# treat "x4" as a single covariate: perform ABE as in abe.fit

abe.fit.fact <- abe(fit2, data = dd, include = "x1", active = "x2",
tau = 0.05, exact = TRUE, criterion = "alpha", alpha = 0.2,
type.test = "Chisq", verbose = TRUE, type.factor = "factor")

summary(abe.fit.fact)

# treat each dummy of "x3" as a separate covariate: perform ABE as in abe.fit

abe.fit.ind <- abe(fit2, data = dd, include = "x1", active = "x2",
tau = 0.05, exact = TRUE, criterion = "alpha", alpha = 0.2,
type.test = "Chisq", verbose = TRUE, type.factor = "individual")

summary(abe.fit.ind)

Bootstrapped Augmented Backward Elimination

Description

'r lifecycle::badge("deprecated")'

This function was deprecated, use 'abe.resampling' instead.

Performs Augmented backward elimination on re-sampled datasets using different bootstrap and re-sampling techniques.

Usage

abe.boot(
  fit,
  data = NULL,
  include = NULL,
  active = NULL,
  tau = 0.05,
  exp.beta = TRUE,
  exact = FALSE,
  criterion = "alpha",
  alpha = 0.2,
  type.test = "Chisq",
  type.factor = NULL,
  num.boot = 100,
  type.boot = c("bootstrap", "mn.bootstrap", "subsampling"),
  prop.sampling = 0.5
)

Arguments

fit

An object of a class '"lm"', '"glm"' or '"coxph"' representing the fit. Note, the functions should be fitted with argument 'x=TRUE' and 'y=TRUE'.

data

data frame used when fitting the object 'fit'.

include

a vector containing the names of variables that will be included in the final model. These variables are used as passive variables during modeling. These variables might be exposure variables of interest or known confounders. They will never be dropped from the working model in the selection process, but they will be used passively in evaluating change-in-estimate criteria of other variables. Note, variables which are not specified as include or active in the model fit are assumed to be active and passive variables.

active

a vector containing the names of active variables. These less important explanatory variables will only be used as active, but not as passive variables when evaluating the change-in-estimate criterion.

tau

Value that specifies the threshold of the relative change-in-estimate criterion. Default is set to 0.05.

exp.beta

Logical specifying if exponent is used in formula to standardize the criterion. Default is set to TRUE.

exact

Logical, specifies if the method will use exact change-in-estimate or approximated. Default is set to FALSE, which means that the method will use approximation proposed by Dunkler et al. Note, setting to TRUE can severely slow down the algorithm, but setting to FALSE can in some cases lead to a poor approximation of the change-in-estimate criterion.

criterion

String that specifies the strategy to select variables for the blacklist. Currently supported options are significance level ''alpha'‘, Akaike information criterion '’AIC'‘ and Bayesian information criterion '’BIC''. If you are using significance level, in that case you have to specify the value of 'alpha' (see parameter 'alpha'). Default is set to '"alpha"'.

alpha

Value that specifies the level of significance as explained above. Default is set to 0.2.

type.test

String that specifies which test should be performed in case the 'criterion = "alpha"'. Possible values are '"F"' and '"Chisq"' (default) for class '"lm"', '"Rao"', '"LRT"', '"Chisq"' (default), '"F"' for class '"glm"' and '"Chisq"' for class '"coxph"'. See also [drop1()].

type.factor

String that specifies how to treat factors, see details, possible values are '"factor"' and '"individual"'.

num.boot

number of bootstrap re-samples

type.boot

String that specifies the type of bootstrap. Possible values are '"bootstrap"', '"mn.bootstrap"', '"subsampling"', see details

prop.sampling

Sampling proportion. Only applicable for 'type.boot="mn.bootstrap"' and 'type.boot="subsampling"', defaults to 0.5. See details.

Details

Used only for compatibility with the previous versions and will be removed at some point; see/use [abe.resampling()] instead.

Value

an object of class 'abe' for which 'summary', 'plot' and 'pie.abe' functions are available. A list with the following elements:

'models' the final models obtained after performing ABE on re-sampled datasets, each object in the list is of the same class as 'fit'

'alpha' the vector of significance levels used

'tau' the vector of threshold values for the change-in-estimate

'num.boot' number of re-sampled datasets

'criterion' criterion used when constructing the black-list

'all.vars' a list of variables used when estimating 'fit'

'fit.or' the initial model

'misc' the parameters of the call to 'abe.boot'

Author(s)

Rok Blagus, rok.blagus@mf.uni-lj.si

Daniela Dunkler

Sladana Babic

References

Daniela Dunkler, Max Plischke, Karen Lefondre, and Georg Heinze. Augmented backward elimination: a pragmatic and purposeful way to develop statistical models. PloS one, 9(11):e113677, 2014.

Riccardo De Bin, Silke Janitza, Willi Sauerbrei and Anne-Laure Boulesteix. Subsampling versus Bootstrapping in Resampling-Based Model Selection for Multivariable Regression. Biometrics 72, 272-280, 2016.

See Also

abe.resampling


ABE for model which includes categorical covariates, factor option

Description

ABE for model which includes categorical covariates, factor option

Usage

abe.fact1(
  fit,
  data,
  include = NULL,
  active = NULL,
  tau = 0.05,
  exp.beta = TRUE,
  exact = FALSE,
  criterion = "alpha",
  alpha = 0.2,
  type.test = "Chisq",
  verbose = TRUE
)

Examples

## Not run: 
set.seed(1)
n=100
x1<-runif(n)
x2<-runif(n)
x3<-runif(n)
y<--5+5*x1+5*x2+ rnorm(n,sd=5)
dd<-data.frame(y,x1,x2,x3)
fit<-lm(y~x1+x2+x3,x=TRUE,y=TRUE,data=dd)

abe.fit<-abe.fact1(fit,data=dd,include="x1",active="x2",
tau=0.05,exp.beta=FALSE,exact=TRUE,criterion="alpha",alpha=0.2,
type.test="Chisq",verbose=FALSE)
summary(abe.fit)

## End(Not run)

ABE for model which includes categorical covariates, factor option, bootstrap version

Description

ABE for model which includes categorical covariates, factor option, bootstrap version

Usage

abe.fact1.boot(
  fit,
  data,
  include = NULL,
  active = NULL,
  tau = 0.05,
  exp.beta = TRUE,
  exact = FALSE,
  criterion = "alpha",
  alpha = 0.2,
  type.test = "Chisq",
  k
)

Examples

## Not run: 
set.seed(1)
n=100
x1<-runif(n)
x2<-runif(n)
x3<-runif(n)
y<--5+5*x1+5*x2+ rnorm(n,sd=5)
dd<-data.frame(y,x1,x2,x3)
fit<-lm(y~x1+x2+x3,x=TRUE,y=TRUE,data=dd)

abe.fit<-abe.fact1.boot(fit,data=dd,include="x1",active="x2",
tau=0.05,exp.beta=FALSE,exact=TRUE,criterion="alpha",alpha=0.2,
type.test="Chisq",k=2)
summary(abe.fit)

## End(Not run)

ABE for model which includes categorical covariates, individual option

Description

ABE for model which includes categorical covariates, individual option

Usage

abe.fact2(
  fit,
  data,
  include = NULL,
  active = NULL,
  tau = 0.05,
  exp.beta = TRUE,
  exact = FALSE,
  criterion = "alpha",
  alpha = 0.2,
  type.test = "Chisq",
  verbose = TRUE
)

Examples

## Not run: 
set.seed(1)
n=100
x1<-runif(n)
x2<-runif(n)
x3<-runif(n)
y<--5+5*x1+5*x2+ rnorm(n,sd=5)
dd<-data.frame(y,x1,x2,x3)
fit<-lm(y~x1+x2+x3,x=TRUE,y=TRUE,data=dd)

abe.fit<-abe.fact2(fit,data=dd,include="x1",active="x2",
tau=0.05,exp.beta=FALSE,exact=TRUE,criterion="alpha",alpha=0.2,
type.test="Chisq",verbose=FALSE)
summary(abe.fit)

## End(Not run)

ABE for model which includes categorical covariates, individual option, bootstrap version

Description

ABE for model which includes categorical covariates, individual option, bootstrap version

Usage

abe.fact2.boot(
  fit,
  data,
  include = NULL,
  active = NULL,
  tau = 0.05,
  exp.beta = TRUE,
  exact = FALSE,
  criterion = "alpha",
  alpha = 0.2,
  type.test = "Chisq",
  k
)

Examples

## Not run: 
set.seed(1)
n=100
x1<-runif(n)
x2<-runif(n)
x3<-runif(n)
y<--5+5*x1+5*x2+ rnorm(n,sd=5)
dd<-data.frame(y,x1,x2,x3)
fit<-lm(y~x1+x2+x3,x=TRUE,y=TRUE,data=dd)

abe.fit<-abe.fact2.boot(fit,data=dd,include="x1",active="x2",
tau=0.05,exp.beta=FALSE,exact=TRUE,criterion="alpha",alpha=0.2,
type.test="Chisq",k=2)
summary(abe.fit)

## End(Not run)

ABE for models which include only numeric covariates

Description

ABE for models which include only numeric covariates

Usage

abe.num(
  fit,
  data,
  include = NULL,
  active = NULL,
  tau = 0.05,
  exp.beta = TRUE,
  exact = FALSE,
  criterion = "alpha",
  alpha = 0.2,
  type.test = "Chisq",
  verbose = TRUE
)

Examples

## Not run: 
set.seed(1)
n=100
x1<-runif(n)
x2<-runif(n)
x3<-runif(n)
y<--5+5*x1+5*x2+ rnorm(n,sd=5)
dd<-data.frame(y,x1,x2,x3)
fit<-lm(y~x1+x2+x3,x=TRUE,y=TRUE,data=dd)

abe.fit<-abe.num(fit,data=dd,include="x1",active="x2",
tau=0.05,exact=TRUE,criterion="alpha",alpha=0.2,
type.test="Chisq",verbose=FALSE)
summary(abe.fit)

## End(Not run)

ABE for model which include only numeric covariates, bootstrap version

Description

ABE for model which include only numeric covariates, bootstrap version

Usage

abe.num.boot(
  fit,
  data,
  include = NULL,
  active = NULL,
  tau = 0.05,
  exp.beta = TRUE,
  exact = FALSE,
  criterion = "alpha",
  alpha = 0.2,
  type.test = "Chisq",
  k
)

Examples

## Not run: 
set.seed(1)
n=100
x1<-runif(n)
x2<-runif(n)
x3<-runif(n)
y<--5+5*x1+5*x2+ rnorm(n,sd=5)
dd<-data.frame(y,x1,x2,x3)
fit<-lm(y~x1+x2+x3,x=TRUE,y=TRUE,data=dd)

abe.fit<-abe.num.boot(fit,data=dd,include="x1",active="x2",
tau=0.05,exp.beta=FALSE,exact=TRUE,criterion="alpha",alpha=0.2,
type.test="Chisq",k=2)

summary(abe.fit)

## End(Not run)

Resampled Augmented Backward Elimination

Description

Performs Augmented backward elimination on re-sampled data sets using different bootstrap and re-sampling techniques.

Usage

abe.resampling(
  fit,
  data = NULL,
  include = NULL,
  active = NULL,
  tau = 0.05,
  exact = FALSE,
  criterion = c("alpha", "AIC", "BIC"),
  alpha = 0.2,
  type.test = c("Chisq", "F", "Rao", "LRT"),
  type.factor = NULL,
  num.resamples = 100,
  type.resampling = c("Wallisch2021", "bootstrap", "mn.bootstrap", "subsampling"),
  prop.sampling = 0.5,
  save.out = c("minimal", "complete"),
  parallel = FALSE,
  seed = NULL,
  ...
)

Arguments

fit

An object of class '"lm"', '"glm"', '"logistf"', '"coxph"', or '"survreg"' representing the fit. Note, the functions should be fitted with argument 'x=TRUE' and 'y=TRUE' (or 'model=TRUE' for '"logistf"' objects).

data

data frame used when fitting the object 'fit'.

include

a vector containing the names of variables that will be included in the final model. These variables are used as passive variables during modeling. These variables might be exposure variables of interest or known confounders. They will never be dropped from the working model in the selection process, but they will be used passively in evaluating change-in-estimate criteria of other variables. Note, variables which are not specified as include or active in the model fit are assumed to be active and passive variables.

active

a vector containing the names of active variables. These less important explanatory variables will only be used as active, but not as passive variables when evaluating the change-in-estimate criterion.

tau

Value that specifies the threshold of the relative change-in-estimate criterion. Default is set to 0.05.

exact

Logical, specifies if the method will use exact change-in-estimate or approximated. Default is set to FALSE, which means that the method will use approximation proposed by Dunkler et al. Note, setting to TRUE can severely slow down the algorithm, but setting to FALSE can in some cases lead to a poor approximation of the change-in-estimate criterion.

criterion

String that specifies the strategy to select variables for the blacklist. Currently supported options are significance level ''alpha'‘, Akaike information criterion '’AIC'‘ and Bayesian information criterion '’BIC''. If you are using significance level, in that case you have to specify the value of 'alpha' (see parameter 'alpha'). Default is set to '"alpha"'.

alpha

Value that specifies the level of significance as explained above. Default is set to 0.2.

type.test

String that specifies which test should be performed in case the 'criterion = "alpha"'. Possible values are '"F"' and '"Chisq"' (default) for class '"lm"', '"Rao"', '"LRT"', '"Chisq"' (default), '"F"' for class '"glm"' and '"Chisq"' for class '"coxph"'. See also drop1.

type.factor

String that specifies how to treat factors, see details, possible values are '"factor"' and '"individual"'.

num.resamples

number of resamples.

type.resampling

String that specifies the type of resampling. Possible values are '"Wallisch2021"', '"bootstrap"', '"mn.bootstrap"', '"subsampling"'. Default is set to '"Wallisch2021"'. See details.

prop.sampling

Sampling proportion. Only applicable for 'type.boot="mn.bootstrap"' and 'type.boot="subsampling"', defaults to 0.5. See details.

save.out

String that specifies if only the minimal output of the refitted models ('save.out="minimal"') or the entire object ('save.out="complete"') is to be saved. Defaults to '"minimal"'

parallel

Logical, specifies if the calculations should be run in parallel 'TRUE' or not 'FALSE'. Defaults to 'FALSE'. See details.

seed

Numeric, a random seed to be used to form re-sampled datasets. Defaults to 'NULL'. Can be used to assure complete reproducibility of the results, see Examples.

...

Further arguments. Currently, this is primarily used to warn users about arguments that are no longer supported.

Details

'type.resampling' can be 'bootstrap' (n observations drawn from the original data with replacement), 'mn.bootstrap' (m out of n observations drawn from the original data with replacement), 'subsampling' (m out of n observations drawn from the original data without replacement, where m is 'prop.sampling*n' ) and '"Wallisch2021"'. When using '"Wallisch2021"' the resampling is done twice: first time using bootstrap (these results are contained in 'models') and the second time using resampling with 'prop.sampling' equal to 0.5 (these results are contained in 'models.wallisch'); see Wallisch et al. (2021).

When using 'parallel=TRUE' parallel backend must be registered before using 'abe.resampling'. The parallel backends available will be system-specific; see [foreach()] for more details.

In earlier versions, abe used to include an exp.beta argument. This is not supported anymore. Instead, the function now uses the exponential change in estimate for logistic and Cox models only.

Value

an object of class 'abe' for which 'summary', 'plot' and 'pie.abe' functions are available. A list with the following elements:

'coefficients' a matrix of coefficients of the final models obtained after performing ABE on re-sampled datasets; if using 'type.resampling="Wallisch2021"', these models are obtained by using bootstrap.

'coefficients.wallisch' if using 'type.resampling="Wallisch2021"' the coefficients of the final models obtained after performing ABE using resampling with 'prop.sampling' equal to 0.5; 'NULL' when using any other option in 'type.resampling'.

'models' the final models obtained after performing ABE on re-sampled datasets, each object in the list is of the same class as 'fit'; if using 'type.resampling="Wallisch2021"', these models are obtained by using bootstrap. These are only returned if 'save.out = "complete"'.

'models.wallisch' similar as 'models'; if using 'type.resampling="Wallisch2021"' the coefficients and terms of the final models obtained after performing ABE using resampling with 'prop.sampling' equal to 0.5; 'NULL' when using any other option in 'type.resampling'. These are only returned if 'save.out = "complete"'.

'model.parameters' a dataframe of alpha and tau values corresponding to the resampled models.

'num.boot' number of resampled datasets

'criterion' criterion used when constructing the black-list

'all.vars' a list of variables used when estimating 'fit'

'fit.global' the initial model. In earlier versions of the package this parameter was called 'fit.or'.

'misc' the parameters of the call to 'abe.resampling'

'id' the rows of the data which were used when refitting the model; the list with elements 'id1' (the rows used to refit the model; when 'type.resampling="Wallisch2021"' these are based on bootstrap) and 'id2' ('NULL' unless when 'type.resampling="Wallisch2021"' in which case these are the rows used to refit the models based on subsampling)

Author(s)

Rok Blagus, rok.blagus@mf.uni-lj.si

Daniela Dunkler

Sladana Babic

References

Daniela Dunkler, Max Plischke, Karen Lefondre, and Georg Heinze. Augmented Backward Elimination: A Pragmatic and Purposeful Way to Develop Statistical Models. PloS One, 9(11):e113677, 2014, [doi:](doi:10.1371/journal.pone.0113677).

Riccardo De Bin, Silke Janitza, Willi Sauerbrei and Anne-Laure Boulesteix. Subsampling versus Bootstrapping in Resampling-Based Model Selection for Multivariable Regression. Biometrics 72, 272-280, 2016, [doi:](doi:10.1111/biom.12381).

Wallisch Christine, Dunkler Daniela, Rauch Geraldine, de Bin Ricardo, Heinze Georg. Selection of Variables for Multivariable Models: Opportunities and Limitations in Quantifying Model Stability by Resampling. Statistics in Medicine 40:369-381, 2021, [doi:](doi:10.1002/sim.8779).

See Also

abe, summary.abe, print.abe, plot.abe, pie.abe

Examples

# simulate some data and fit a model

set.seed(1)
n = 100
x1 <- runif(n)
x2 <- runif(n)
x3 <- runif(n)
y<- -5 + 5 * x1 + 5 * x2 + rnorm(n, sd = 5)
dd <- data.frame(y = y, x1 = x1, x2 = x2, x3 = x3)
fit <- lm(y ~ x1 + x2 + x3, x = TRUE, y = TRUE, data = dd)

# use ABE on 10 re-samples considering different
# change-in-estimate thresholds and significance levels

fit.resample1 <- abe.resampling(fit, data = dd, include = "x1",
active = "x2", tau = c(0.05, 0.1), exact = TRUE,
criterion = "alpha", alpha = c(0.2, 0.05), type.test = "Chisq",
num.resamples = 10, type.resampling = "Wallisch2021")

names(summary(fit.resample1))
summary(fit.resample1)$var.rel.frequencies
summary(fit.resample1)$model.rel.frequencies
summary(fit.resample1)$var.coefs[1]
summary(fit.resample1)$pair.rel.frequencies[1]
print(fit.resample1)

# use ABE on 10 bootstrap re-samples considering different
# change-in-estimate thresholds and significance levels

fit.resample2 <- abe.resampling(fit, data = dd, include = "x1",
active = "x2", tau = c(0.05, 0.1),exact = TRUE,
criterion = "alpha", alpha = c(0.2, 0.05), type.test = "Chisq",
num.resamples = 10, type.resampling = "bootstrap")

summary(fit.resample2)

# use ABE on 10 subsamples randomly selecting 50% of subjects
# considering different change-in-estimate thresholds and
# significance levels

fit.resample3 <- abe.resampling(fit, data = dd, include = "x1",
active = "x2", tau = c(0.05,0.1), exact = TRUE,
criterion = "alpha", alpha = c(0.2, 0.05), type.test = "Chisq",
num.resamples = 10, type.resampling = "subsampling", prop.sampling = 0.5)

summary(fit.resample3)

#Assure reproducibility of the results

fit.resample.1 <- abe.resampling(fit,  data = dd, include = "x1",
active = "x2", tau = c(0.05, 0.1), exact = TRUE,
criterion = "alpha", alpha = c(0.2, 0.05), type.test = "Chisq",
num.resamples = 10, type.resampling = "Wallisch2021")

fit.resample.2 <- abe.resampling(fit, data = dd, include = "x1",
active = "x2", tau = c(0.05, 0.1), exact = TRUE,
criterion = "alpha", alpha = c(0.2, 0.05), type.test = "Chisq",
num.resamples = 10, type.resampling = "Wallisch2021")

#since different seeds are used, fit.resample.1 and fit.resample.2 give different results

fit.resample.3 <- abe.resampling(fit, data = dd, include = "x1",
active = "x2", tau = c(0.05, 0.1), exact = TRUE,
criterion = "alpha", alpha = c(0.2, 0.05), type.test = "Chisq",
num.resamples = 10, type.resampling = "Wallisch2021", seed = 87982)

fit.resample.4 <- abe.resampling(fit, data = dd, include = "x1",
active = "x2", tau = c(0.05, 0.1), exact = TRUE,
criterion = "alpha", alpha = c(0.2, 0.05), type.test = "Chisq",
num.resamples = 10, type.resampling = "Wallisch2021", seed = 87982)

#now fit.resample.3 and fit.resample.4 give exactly the same results

#' Example to run parallel computation on windows, using all but 2 cores

#library(doParallel)
#N_CORES <- detectCores()
#cl <- makeCluster(N_CORES-2)
#registerDoParallel(cl)
#fit.resample <- abe.resampling(fit, data = dd, include = "x1", active = "x2",
#tau = c(0.05, 0.1), exact = TRUE, criterion = "alpha", alpha = c(0.2, 0.05),
#type.test = "Chisq", num.resamples = 50, type.resampling = "Wallisch2021")
#stopCluster(cl)

grep function changed

Description

grep function changed

Usage

my_grep(...)

Examples

## Not run: 
my_grep("x",c("xy","xz","ab"))

## End(Not run)

grepl function changed

Description

grepl function changed

Usage

my_grepl(...)

Examples

## Not run: 
my_grepl("x",c("xy","xz","ab"))

## End(Not run)

update function which searches for objects within the parent environment

Description

update function which searches for objects within the parent environment

Usage

my_update(mod, formula = NULL, data = NULL)

Examples

## Not run: 
set.seed(1)
n=100
x1<-runif(n)
x2<-runif(n)
x3<-runif(n)
y<--5+5*x1+5*x2+ rnorm(n,sd=5)
dd<-data.frame(y,x1,x2,x3)
fit<-lm(y~x1+x2+x3,x=TRUE,y=TRUE,data=dd)

ddn<-dd[-1,]
my_update(fit,data=ddn)
my_update(fit,formula=as.formula(".~.-x1"),data=ddn)

## End(Not run)

update function which searches for objects within the parent environment, gives a nicer output than my_update

Description

update function which searches for objects within the parent environment, gives a nicer output than my_update

Usage

my_update2(mod, formula = NULL, data = NULL, data.n = NULL)

Examples

## Not run: 
set.seed(1)
n=100
x1<-runif(n)
x2<-runif(n)
x3<-runif(n)
y<--5+5*x1+5*x2+ rnorm(n,sd=5)
dd<-data.frame(y,x1,x2,x3)
fit<-lm(y~x1+x2+x3,x=TRUE,y=TRUE,data=dd)

ddn<-dd[-1,]
my_update2(fit,data=ddn,data.n="ddn")
my_update2(fit,formula=as.formula(".~.-x1"),data=ddn,data.n="ddn")

## End(Not run)

update function which searches for objects within the parent environment, bootstrap version, i.e. can only update the model based on a new dataset

Description

update function which searches for objects within the parent environment, bootstrap version, i.e. can only update the model based on a new dataset

Usage

my_update_boot(mod, data = NULL)

Examples

## Not run: 
set.seed(1)
n=100
x1<-runif(n)
x2<-runif(n)
x3<-runif(n)
y<--5+5*x1+5*x2+ rnorm(n,sd=5)
dd<-data.frame(y,x1,x2,x3)
fit<-lm(y~x1+x2+x3,x=TRUE,y=TRUE,data=dd)

ddn<-dd[-1,]
my_update_boot(fit,data=ddn)

## End(Not run)

Pie Function

Description

Pie function for the resampled/bootstrapped version of ABE. Plots a pie chart of the model frequencies for specified values of 'alpha' and 'tau'.

Usage

pie.abe(x, alpha = NULL, tau = NULL, labels = NA, ...)

Arguments

x

an object of class '"abe"', an object returned by a call to [abe.resampling()]

alpha

values of alpha for which the plot is to be made (can be a vector of length >1)

tau

values of tau for which the plot is to be made (can be a vector of length >1)

labels

plot labels, defaults to NA, i.e. no labels are ploted

...

Arguments to be passed to methods, such as graphical parameters (see [pie()], [barplot()], [hist()]).

Details

When using 'type.resampling="Wallisch2021"' the plot is based on subsampling with sampling proportion equal to 0.5, otherwise as specified in 'type.resampling'.

Author(s)

Rok Blagus, rok.blagus@mf.uni-lj.si

Sladana Babic

See Also

abe.resampling, summary.abe, plot.abe

Examples

set.seed(10)
n = 100
x1 <- runif(n)
x2 <- runif(n)
x3 <- runif(n)
y <- -5 + 5 * x1 + 5 * x2 + rnorm(n, sd = 5)
dd <- data.frame(y = y, x1 = x1, x2 = x2, x3 = x3)
fit <- lm(y ~ x1 + x2 + x3, x = TRUE, y = TRUE, data = dd)

fit.resample <- abe.resampling(fit, data = dd, include = "x1", active = "x2",
tau = c(0.05, 0.1), exact = TRUE, criterion = "alpha", alpha = c(0.2, 0.05),
type.test = "Chisq", num.resamples = 50, type.resampling = "Wallisch2021")

pie.abe(fit.resample, alpha = 0.2, tau = 0.1)

fit.resample <- abe.resampling(fit, data = dd, include = "x1", active = "x2",
tau=  c(0.05, 0.1), exact=TRUE, criterion = "alpha", alpha = c(0.2, 0.05),
type.test = "Chisq", num.resamples = 50, type.resampling = "subsampling")

pie.abe(fit.resample, alpha = 0.2, tau = 0.1)

Plot Function

Description

Plot function for the resampled/bootstrapped version of ABE.

Usage

## S3 method for class 'abe'
plot(
  x,
  type.plot = c("coefficients", "variables", "models", "stability", "pairwise"),
  alpha = NULL,
  tau = NULL,
  variable = NULL,
  type.stability = c("alpha", "tau"),
  pval = 0.01,
  ...
)

Arguments

x

an object of class '"abe"', an object returned by a call to [abe.resampling()]

type.plot

string which specifies the type of the plot. See details.

alpha

values of alpha for which the plot is to be made (can be a vector of length >1)

tau

values of tau for which the plot is to be made (can be a vector of length >1)

variable

variables for which the plot is to be made (can be a vector of length >1)

type.stability

string which specifies the type of stability plot. See details.

pval

significance level to be used to determine a significant deviation from the expected pairwise inclusion frequency under independence (default 0.01). Only relevant if 'type.plot="pairwise"'.

...

Arguments to be passed to methods, such as graphical parameters.

Details

When using 'type.plot="coefficients"' the function plots a histogram of the estimated regression coefficients for the specified variables, alpha(s) and tau(s) obtained from different re-sampled datasets. When the variable is not included in the final model, its regression coefficient is set to zero. When using 'type.resampling="Wallisch2021"' the plot is based on bootstrap, otherwise as specified in 'type.resampling'.

When using type.plot="variables" the function plots a barplot of the relative inclusion frequencies of the specified variables, for the specified values of alpha and tau. When using 'type.resampling="Wallisch2021"' the plot is based on subsampling with sampling proportion equal to 0.5, otherwise as specified in 'type.resampling'.

When using type.plot="models" the function plots a barplot of the relative frequencies of the final models for specified alpha(s) and tau(s). When using 'type.resampling="Wallisch2021"' the plot is based on subsampling with sampling proportion equal to 0.5, otherwise as specified in 'type.resampling'.

When using 'type.plot="stability"' the function plots variable inclusion frequencies for each value of alpha. 'type.stability' specifies if inclusion frequencies should be plotted as a function of alpha (default) or tau.

When using 'type.plot="pairwise"' the function plots a heatmap of differences between observed pairwise inclusion frequencies and the expected pairwise inclusion frequencies under independence. A high value indicates overselection, i.e. the pair of variables is selected together more often than expected under independence. Selection frequencies (in

Author(s)

Rok Blagus, rok.blagus@mf.uni-lj.si

Sladana Babic

Daniela Dunkler

Gregor Steiner

See Also

abe.resampling, summary.abe, pie.abe

Examples

set.seed(1)
n=100
x1<-runif(n)
x2<-runif(n)
x3<-runif(n)
y<--5+5*x1+5*x2+ rnorm(n,sd=5)
dd<-data.frame(y=y,x1=x1,x2=x2,x3=x3)
fit<-lm(y~x1+x2+x3,x=TRUE,y=TRUE,data=dd)

fit.resample<-abe.resampling(fit,data=dd,include="x1",active="x2",
tau=c(0.05,0.1),exact=TRUE,
criterion="alpha",alpha=c(0.2,0.05),type.test="Chisq",
num.resamples=50,type.resampling="Wallisch2021")

plot(fit.resample,type.plot="coefficients",
alpha=0.2,tau=0.1,variable=c("x1","x3"),
col="light blue")

plot(fit.resample,type.plot="variables",
alpha=0.2,tau=0.1,variable=c("x1","x2","x3"),
col="light blue",horiz=TRUE,las=1)

par(mar=c(4,6,4,2))
plot(fit.resample,type.plot="models",
alpha=0.2,tau=0.1,col="light blue",horiz=TRUE,las=1)

fit.resample<-abe.resampling(fit,data=dd,include="x1",active="x2",
tau=c(0.05,0.1),exact=TRUE,
criterion="alpha",alpha=c(0.2,0.05),type.test="Chisq",
num.resamples=50,type.resampling="bootstrap")

plot(fit.resample,type.plot="coefficients",
alpha=0.2,tau=0.1,variable=c("x1","x3"),
col="light blue")

fit.resample<-abe.resampling(fit,data=dd,include="x1",active="x2",
tau=c(0.05,0.1),exact=TRUE,
criterion="alpha",alpha=c(0.2,0.05),type.test="Chisq",
num.resamples=50,type.resampling="subsampling")

plot(fit.resample,type.plot="variables",
alpha=0.2,tau=0.1,variable=c("x1","x2","x3"),
col="light blue",horiz=TRUE,las=1)

par(mar=c(4,6,4,2))
plot(fit.resample,type.plot="models",
alpha=0.2,tau=0.1,col="light blue",horiz=TRUE,las=1)

Print Function

Description

Prints a summary table of a bootstrapped/resampled version of ABE. The table displays the relative inclusion frequencies of the covariates from the initial model, the coefficient estimates and standard errors from the initial model (model with all covariates), the selected model, resampled median and percentiles for the estimates of the regression coefficients for each variable from the initial model, root mean squared difference ratio (RMSD) and relative bias conditional on selection (RBCS), see 'details'.

Usage

## S3 method for class 'abe'
print(
  x,
  type = c("coefficients", "coefficients reporting", "models"),
  models.n = NULL,
  conf.level = 0.95,
  alpha = NULL,
  tau = NULL,
  digits = 3,
  ...
)

Arguments

x

an object of class '"abe"', an object returned by a call to [abe.resampling()]

type

the type of the output. 'type = "coefficients"' prints summary statistics for each coefficient, 'type = "coefficients reporting"' prints a reduced version of the coefficient statistics, and 'type = "models"' reports model selection frequencies.

models.n

controls the number of models printed if 'type = "models"'. See details.

conf.level

the confidence level, defaults to 0.95, see 'details'

alpha

the alpha value for which the output is to be printed, defaults to 'NULL'

tau

the tau value for which the output is to be printed, defaults to 'NULL'

digits

integer, indicating the number of digits to display in the table. Defaults to 2

...

additional arguments affecting the summary produced.

Details

When using 'type.resampling="Wallisch2021"' in a call to [abe.resampling()], the results for the relative inclusion frequencies of the covariates from the initial model are based on subsampling with sampling proportion equal to 0.5 and the other results are based on bootstrap as suggested by Wallisch et al. (2021); otherwise all the results are obtained by using the method as specified in 'type.resampling'. Parameter 'conf.level' defines the lower and upper quantile of the bootstrapped/resampled distribution such that equal proportion of values are smaller and larger than the lower and the upper quantile, respectively.

If 'type = "models"', the 'models.n' parameter controls the number of models printed. One option is to directly specify the number of models to return (i.e. an integer larger than 1). Alternatively, if 'models.n' is set to a number less than (or equal to) 1, the number of models returned is such that the cumulative frequency attains that value. By default ('models.n = NULL'), the top 20 models or all models up to a cumulative frequency of 0.8, whichever is shorter, are returned. The selected model is marked with an asterisk. If it is not among the printed models, it is added as the last model.

Author(s)

Rok Blagus, rok.blagus@mf.uni-lj.si

Sladana Babic

Daniela Dunkler

Gregor Steiner

References

Wallisch C, Dunkler D, Rauch G, de Bin R, Heinze G. Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling. Statistics in Medicine 40:369-381, 2021.

See Also

abe.resampling, summary.abe, plot.abe, pie.abe

Examples

set.seed(100)
n = 100
x1 <- runif(n)
x2 <- runif(n)
x3 <- runif(n)
y<- -5 + 5 * x1 + 5 * x2 + rnorm(n, sd = 5)
dd <- data.frame(y = y,x1 = x1, x2 = x2, x3 = x3)
fit <- lm(y ~ x1 + x2 + x3, x = TRUE, y = TRUE, data= dd)

fit.resample <- abe.resampling(fit, data = dd, include = "x1", active = "x2",
tau = c(0.05, 0.1), exact = TRUE, criterion = "alpha", alpha = c(0.2, 0.05),
type.test = "Chisq", num.resamples = 50, type.resampling = "Wallisch2021")

print(fit.resample, conf.level = 0.95, alpha = 0.2, tau = 0.05)

Summary Function

Description

makes a summary of a resampled version of ABE

Usage

## S3 method for class 'abe'
summary(
  object,
  conf.level = 0.95,
  pval = 0.01,
  alpha = NULL,
  tau = NULL,
  models.n = NULL,
  ...
)

Arguments

object

an object of class '"abe"', an object returned by a call to [abe.resampling()]

conf.level

the confidence level, defaults to 0.95, see 'details'

pval

significance level to be used to determine a significant deviation from the expected pairwise inclusion frequency under independence.

alpha

the alpha value for which the output is to be printed. If 'NULL', the output is printed for all alpha values.

tau

the tau value for which the output is to be printed. If 'NULL', the output is printed for all tau values.

models.n

controls the number of models printed for 'model.rel.frequencies'. See details.

...

additional arguments affecting the summary produced.

Details

Parameter 'conf.level' defines the lower and upper quantile of the bootstrapped/resampled distribution such that equal proportion of values are smaller and larger than the lower and the upper quantile, respectively.

The 'models.n' parameter controls the number of models printed in 'model.rel.frequencies'. One option is to directly specify the number of models to return (i.e. an integer larger than 1). Alternatively, if 'models.n' is set to a number less than (or equal to) 1, the number of models returned is such that the cumulative frequency attains that value. By default ('models.n = NULL'), the top 20 models or all models up to a cumulative frequency of 0.8, whichever is shorter, are returned. The selected model is marked with an asterisk. If it is not among the printed models, it is added as the last model.

Value

a list with the following elements:

'var.rel.frequencies': inclusion relative frequencies for all variables from the initial model; if using 'type.resampling="Wallisch2021"' in a call to [abe.resampling()] these results are based on subsampling with sampling proportion equal to 0.5, otherwise by using the method as specified by 'type.sampling'

'model.rel.frequencies': relative frequencies of the final models; if using 'type.resampling="Wallisch2021"' in a call to [abe.resampling()] these results are based on subsampling with sampling proportion equal to 0.5, otherwise by using the method as specified by 'type.sampling'

'var.coefs': coefficient estimates and standard errors from the global and the selected model and medians, means, percentiles and standard deviations for the resampled estimates for each variable from the initial model; if using 'type.resampling="Wallisch2021"' in a call to [abe.resampling()] these results are based on bootstrap, otherwise by using the method as specified by 'type.sampling'

'pair.rel.frequencies': pairwise selection frequencies (in percent) for all pairs of variables. The significance of the deviation from the expected pairwise inclusion under independence is tested using a chi-squared test. If using 'type.resampling="Wallisch2021"' in a call to [abe.resampling()] these results are based on subsampling with sampling proportion equal to 0.5, otherwise by using the method as specified by 'type.sampling'

Author(s)

Rok Blagus, rok.blagus@mf.uni-lj.si

Sladana Babic

Daniela Dunkler

Gregor Steiner

See Also

abe.resampling, print.abe, plot.abe, pie.abe

Examples

set.seed(1)
n=100
x1<-runif(n)
x2<-runif(n)
x3<-runif(n)
y<--5+5*x1+5*x2+ rnorm(n,sd=5)
dd<-data.frame(y=y,x1=x1,x2=x2,x3=x3)
fit<-lm(y~x1+x2+x3,x=TRUE,y=TRUE,data=dd)

fit.resample<-abe.resampling(fit,data=dd,include="x1",active="x2",
tau=c(0.05,0.1),exact=TRUE,
criterion="alpha",alpha=c(0.2,0.05),type.test="Chisq",
num.resamples=50,type.resampling="Wallisch2021")

summary(fit.resample)