Title: | General to Specific Modeling and Indicator Saturation in 2SLS Models |
Version: | 0.1.2 |
Description: | Provides facilities of general to specific model selection for exogenous regressors in 2SLS models. Furthermore, indicator saturation methods can be used to detect outliers and structural breaks in the sample. |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Depends: | R (≥ 2.10), gets (≥ 0.38), ivreg |
Imports: | stats, stringr |
Suggests: | covr, Formula, knitr, rmarkdown, testthat (≥ 3.0.0) |
URL: | https://github.com/jkurle/ivgets |
BugReports: | https://github.com/jkurle/ivgets/issues |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2024-07-15 09:34:35 UTC; jonas |
Author: | Kurle Jonas |
Maintainer: | Kurle Jonas <mail@jonaskurle.com> |
Repository: | CRAN |
Date/Publication: | 2024-07-15 10:00:02 UTC |
ivgets: General to Specific Modeling and Indicator Saturation in 2SLS Models
Description
Provides facilities of general to specific model selection for exogenous regressors in 2SLS models. Furthermore, indicator saturation methods can be used to detect outliers and structural breaks in the sample.
Author(s)
Maintainer: Kurle Jonas mail@jonaskurle.com (ORCID)
See Also
Useful links:
Artificial data set for illustration.
Description
A data set containing dependent variable, endogenous and exogenous regressors, and excluded instruments for 2SLS models. The structural error is also stored even though not observed in practice.
Usage
artificial2sls
Format
A data frame with 100 observations (rows) and 16 variables (columns):
name | variable description |
y | dependent variable |
x1 | intercept |
x2 | relevant exogenous regressor |
x3 | irrelevant exogenous regressor |
x4 | irrelevant exogenous regressor |
x5 | irrelevant exogenous regressor |
x6 | irrelevant exogenous regressor |
x7 | irrelevant exogenous regressor |
x8 | irrelevant exogenous regressor |
x9 | irrelevant exogenous regressor |
x10 | irrelevant exogenous regressor |
x11 | relevant endogenous regressor |
u | structural error (in practice unobserved) |
z11 | excluded instrument |
z12 | excluded instrument |
id | unique observation identifier |
Artificial data set with outliers for illustration.
Description
A data set containing dependent variable, endogenous and exogenous regressors, and excluded instruments for 2SLS models. The structural error is also stored even though not observed in practice. Some errors are contaminated, making these observations outliers.
Usage
artificial2sls_contaminated
Format
A data frame with 100 observations (rows) and 16 variables (columns):
name | variable description |
y | dependent variable |
x1 | intercept |
x2 | relevant exogenous regressor |
x3 | irrelevant exogenous regressor |
x4 | irrelevant exogenous regressor |
x5 | irrelevant exogenous regressor |
x6 | irrelevant exogenous regressor |
x7 | irrelevant exogenous regressor |
x8 | irrelevant exogenous regressor |
x9 | irrelevant exogenous regressor |
x10 | irrelevant exogenous regressor |
x11 | relevant endogenous regressor |
u | structural error (in practice unobserved) |
z11 | excluded instrument |
z12 | excluded instrument |
id | unique observation identifier |
Details
The data frame has two additional attributes that store the indices
of the outliers, "outliers"
, and their magnitudes
"magnitude"
.
Artificial data set without outliers prepared for shiny application.
Description
Artificial data set without outliers prepared for shiny application.
Usage
artificial2sls_shiny
Format
A data frame with 100 observations (rows) and 17 variables (columns):
name | variable description |
y | dependent variable |
x1 | intercept |
x2 | relevant exogenous regressor |
x3 | irrelevant exogenous regressor |
x4 | irrelevant exogenous regressor |
x5 | irrelevant exogenous regressor |
x6 | irrelevant exogenous regressor |
x7 | irrelevant exogenous regressor |
x8 | irrelevant exogenous regressor |
x9 | irrelevant exogenous regressor |
x10 | irrelevant exogenous regressor |
x11 | relevant endogenous regressor |
u | structural error (in practice unobserved) |
z11 | excluded instrument |
z12 | excluded instrument |
id | unique observation identifier |
is.outlier | factor variable whether the observation is an outlier (1 ) or not (0 )
|
Extract the first and second stage regressors of ivreg formula
Description
extract_variables
takes a formula object for ivreg::ivreg()
, i.e.
in a format of y ~ x1 + x2 | x1 + z2
and extracts the different
elements in a list.
Usage
extract_variables(formula)
Arguments
formula |
A formula for the ivreg::ivreg function, i.e. in format
|
Value
extract_variables
returns a list with three components:
$yvar
stores the name of the dependent variable, $first
the
names of the regressors of the first stage and $second
the names of
the second stage regressors.
Function factory for creating indicators from their names
Description
factory_indicators
creates a function that takes the name of an
indicator and returns the corresponding indicator to be used in a regression.
For user-specified indicators, it extracts the corresponding column from the
uis matrix.
Usage
factory_indicators(n)
Arguments
n |
An integer specifying the length of the indicators. |
Details
Argument n
should equal the number of observations in the
data set which will be augmented with the indicators.
The created function takes a name of an indicator and the original uis argument that was used in indicator saturation and returns the indicator.
Value
factory_indicators
returns a function called creator()
.
Gets modeling on an ivreg object
Description
gets.ivreg
conducts general-to-specific model selection on an ivreg
object returned by ivreg::ivreg()
.
Usage
## S3 method for class 'ivreg'
gets(
x,
gum.result = NULL,
t.pval = 0.05,
wald.pval = t.pval,
do.pet = TRUE,
ar.LjungB = NULL,
arch.LjungB = NULL,
normality.JarqueB = NULL,
include.gum = FALSE,
include.1cut = FALSE,
include.empty = FALSE,
max.paths = NULL,
turbo = FALSE,
tol = 1e-07,
max.regs = NULL,
print.searchinfo = TRUE,
alarm = FALSE,
keep_exog = NULL,
overid = NULL,
weak = NULL,
...
)
Arguments
x |
An object of class |
gum.result |
a |
t.pval |
|
wald.pval |
|
do.pet |
|
ar.LjungB |
a two element |
arch.LjungB |
a two element |
normality.JarqueB |
|
include.gum |
|
include.1cut |
|
include.empty |
|
max.paths |
|
turbo |
|
tol |
numeric value ( |
max.regs |
|
print.searchinfo |
|
alarm |
|
keep_exog |
A numeric vector of indices or a character vector of names
corresponding to the exogenous regressors in the |
overid |
|
weak |
|
... |
Further arguments passed to or from other methods. |
Value
Returns a list of class "ivgets"
with three named elements.
$selection
stores the selection results from
getsFun
(including paths, terminal models, and best
specification). $final
stores the ivreg
model
object of the best specification or NULL
if the GUM does not pass
all diagnostics. $keep
stores the names of the regressors that were
not selected over, including the endogenous regressors, which are always
kept.
Indicator saturation modeling on an ivreg object
Description
isat.ivreg
conducts indicator saturation model selection on an ivreg
object returned by ivreg::ivreg()
.
Usage
## S3 method for class 'ivreg'
isat(
y,
iis = TRUE,
sis = FALSE,
tis = FALSE,
uis = FALSE,
blocks = NULL,
ratio.threshold = 0.8,
max.block.size = 30,
t.pval = 1/NROW(data),
wald.pval = t.pval,
do.pet = FALSE,
ar.LjungB = NULL,
arch.LjungB = NULL,
normality.JarqueB = NULL,
info.method = c("sc", "aic", "hq"),
include.1cut = FALSE,
include.empty = FALSE,
max.paths = NULL,
parallel.options = NULL,
turbo = FALSE,
tol = 1e-07,
max.regs = NULL,
print.searchinfo = TRUE,
plot = NULL,
alarm = FALSE,
overid = NULL,
weak = NULL,
fast = FALSE,
...
)
Arguments
y |
An object of class |
iis |
logical. If |
sis |
logical. If |
tis |
logical. If |
uis |
a matrix of regressors, or a list of matrices. If a list, the matrices must have named columns that should not overlap with column names of any other matrices in the list. |
blocks |
|
ratio.threshold |
Minimum ratio of variables in each block to total observations to determine the block size, default=0.8. Only relevant if blocks = |
max.block.size |
Maximum size of block of variables to be selected over, default=30. Block size used is the maximum of given by either the ratio.threshold and max.block.size |
t.pval |
numeric value between 0 and 1. The significance level used for the two-sided regressor significance t-tests |
wald.pval |
numeric value between 0 and 1. The significance level used for the Parsimonious Encompassing Tests (PETs) |
do.pet |
logical. If |
ar.LjungB |
a two-item list with names |
arch.LjungB |
a two-item list with names |
normality.JarqueB |
|
info.method |
character string, "sc" (default), "aic" or "hq", which determines the information criterion to be used when selecting among terminal models. The abbreviations are short for the Schwarz or Bayesian information criterion (sc), the Akaike information criterion (aic) and the Hannan-Quinn (hq) information criterion |
include.1cut |
logical. If |
include.empty |
logical. If |
max.paths |
|
parallel.options |
|
turbo |
logical. If |
tol |
numeric value (default = 1e-07). The tolerance for detecting linear dependencies in the columns of the regressors (see |
max.regs |
integer. The maximum number of regressions along a deletion path. It is not recommended that this is altered |
print.searchinfo |
logical. If |
plot |
NULL or logical. If |
alarm |
logical. If |
overid |
|
weak |
|
fast |
A logical value indicating whether to speed up the 2SLS
estimation but providing less details. Requires |
... |
Further arguments passed to or from other methods. |
Value
Returns a list of class "ivisat"
with two named elements.
$selection
stores the selection results from
isat
(including paths, terminal models, and best
specification). $final
stores the ivreg
model
object of the best specification or NULL
if the GUM does not pass
all diagnostics.
User diagnostics for getsFun() and isat()
Description
ivDiag
provides several diagnostic tests for 2SLS models that can be
used during model selection. Currently, a weak instrument F-test of the first
stage(s) and the Sargan test of overidentifying restrictions on the validity
of the instruments are implemented.
Usage
ivDiag(x, weak = FALSE, overid = FALSE)
Arguments
x |
A list containing the estimation results of the 2SLS model. Must
contain an entry |
weak |
A logical value whether to conduct weak instrument tests. |
overid |
A logical value whether to conduct the Sargan test of overidentifying restrictions. |
Details
The resulting matrix also has an attribute named
"is.reject.bad"
, which is a logical vector of length m. Each
entry records whether a rejection of the test means that the diagnostics
have failed or vice versa. The first entry refers to the first row, the
second entry to the second row etc. However, this attribute is not used in
the following estimations. Instead, the decision rule is specified inside
the user.fun
argument of gets::diagnostics()
, which allows for a
named entry $is.reject.bad
.
Value
Returns a matrix with three columns named "statistic"
,
"df"
, and "p-value"
and m rows. Each row records these
results for one of the tests, so the number of rows varies by the arguments
specified and the model (e.g. how many first stages equations there are).
General-to-specific modeling for 2SLS models
Description
General-to-specific modeling for 2SLS models
Usage
ivgets(
formula,
data,
gum.result = NULL,
t.pval = 0.05,
wald.pval = t.pval,
do.pet = TRUE,
ar.LjungB = NULL,
arch.LjungB = NULL,
normality.JarqueB = NULL,
include.gum = FALSE,
include.1cut = FALSE,
include.empty = FALSE,
max.paths = NULL,
turbo = FALSE,
tol = 1e-07,
max.regs = NULL,
print.searchinfo = TRUE,
alarm = FALSE,
keep_exog = NULL,
overid = NULL,
weak = NULL
)
Arguments
formula |
A formula in the format |
data |
A data frame with all necessary variables y, x, and z. |
gum.result |
a |
t.pval |
|
wald.pval |
|
do.pet |
|
ar.LjungB |
a two element |
arch.LjungB |
a two element |
normality.JarqueB |
|
include.gum |
|
include.1cut |
|
include.empty |
|
max.paths |
|
turbo |
|
tol |
numeric value ( |
max.regs |
|
print.searchinfo |
|
alarm |
|
keep_exog |
A numeric vector of indices or a character vector of names
corresponding to the exogenous regressors in the |
overid |
|
weak |
|
Value
Returns a list of class "ivgets"
with three named elements.
$selection
stores the selection results from
getsFun
(including paths, terminal models, and best
specification). $final
stores the ivreg
model
object of the best specification or NULL
if the GUM does not pass
all diagnostics. $keep
stores the names of the regressors that were
not selected over, including the endogenous regressors, which are always
kept.
Indicator saturation modeling for 2SLS models
Description
Indicator saturation modeling for 2SLS models
Usage
ivisat(
formula,
data,
iis = TRUE,
sis = FALSE,
tis = FALSE,
uis = FALSE,
blocks = NULL,
ratio.threshold = 0.8,
max.block.size = 30,
t.pval = 1/NROW(data),
wald.pval = t.pval,
do.pet = FALSE,
ar.LjungB = NULL,
arch.LjungB = NULL,
normality.JarqueB = NULL,
info.method = c("sc", "aic", "hq"),
include.1cut = FALSE,
include.empty = FALSE,
max.paths = NULL,
parallel.options = NULL,
turbo = FALSE,
tol = 1e-07,
max.regs = NULL,
print.searchinfo = TRUE,
plot = NULL,
alarm = FALSE,
overid = NULL,
weak = NULL,
fast = FALSE
)
Arguments
formula |
A formula in the format |
data |
A data frame with all necessary variables y, x, and z. |
iis |
logical. If |
sis |
logical. If |
tis |
logical. If |
uis |
a matrix of regressors, or a list of matrices. If a list, the matrices must have named columns that should not overlap with column names of any other matrices in the list. |
blocks |
|
ratio.threshold |
Minimum ratio of variables in each block to total observations to determine the block size, default=0.8. Only relevant if blocks = |
max.block.size |
Maximum size of block of variables to be selected over, default=30. Block size used is the maximum of given by either the ratio.threshold and max.block.size |
t.pval |
numeric value between 0 and 1. The significance level used for the two-sided regressor significance t-tests |
wald.pval |
numeric value between 0 and 1. The significance level used for the Parsimonious Encompassing Tests (PETs) |
do.pet |
logical. If |
ar.LjungB |
a two-item list with names |
arch.LjungB |
a two-item list with names |
normality.JarqueB |
|
info.method |
character string, "sc" (default), "aic" or "hq", which determines the information criterion to be used when selecting among terminal models. The abbreviations are short for the Schwarz or Bayesian information criterion (sc), the Akaike information criterion (aic) and the Hannan-Quinn (hq) information criterion |
include.1cut |
logical. If |
include.empty |
logical. If |
max.paths |
|
parallel.options |
|
turbo |
logical. If |
tol |
numeric value (default = 1e-07). The tolerance for detecting linear dependencies in the columns of the regressors (see |
max.regs |
integer. The maximum number of regressions along a deletion path. It is not recommended that this is altered |
print.searchinfo |
logical. If |
plot |
NULL or logical. If |
alarm |
logical. If |
overid |
|
weak |
|
fast |
A logical value indicating whether to speed up the 2SLS
estimation but providing less details. Requires |
Value
Returns a list of class "ivisat"
with two named elements.
$selection
stores the selection results from
isat
(including paths, terminal models, and best
specification). $final
stores the ivreg
model
object of the best specification or NULL
if the GUM does not pass
all diagnostics.
User estimator ivreg for getsFun() and isat()
Description
ivregFun
calls ivreg::ivreg()
in a format that is suitable for the
model selection function gets::getsFun()
and for the indicator saturation
function gets::isat()
.
Usage
ivregFun(y, x, z, formula, tests, fast = FALSE)
Arguments
y |
A numeric vector with no missing values. |
x |
A matrix or |
z |
A numeric vector or matrix. |
formula |
A formula in the format |
tests |
A logical value whether to calculate the
|
fast |
A logical value whether to speed up the 2SLS estimation but
providing less details. Requires |
Details
For the required outputs of user-specified estimators, see the article "User-Specified General-to-Specific and Indicator Saturation Methods" by Genaro Sucarrat, published in the R Journal: https://journal.r-project.org/archive/2021/RJ-2021-024/index.html
Value
A list with entries needed for model selection via gets::getsFun()
or gets::isat()
.
Takes ivreg formula and returns formula compatible with model selection
Description
new_formula
takes a formula object for ivreg::ivreg()
, i.e. in a
format of y ~ x1 + x2 | x1 + z2
, and returns a list with element
suitable for model selection. For example, it updates the data by creating
an intercept if specified in the formula, checks for collinearity among the
regressors, and updates the formula accordingly.
Usage
new_formula(formula, data, keep_exog)
Arguments
formula |
A formula for the ivreg::ivreg function, i.e. in format
|
data |
A data frame. |
keep_exog |
A numeric vector of indices or a character vector of names
corresponding to the exogenous regressors in the |
Value
A list with several named elements. Component $fml
stores the
new baseline formula that will be used for model selection. Components
y
, x
, and z
store the data of the dependent variable,
structural regressors, and excluded instruments. The entries
$depvar
, $x1
, $x2
, $z1
, and $z2
contain
the names of the dependent variable, endogenous and exogenous regressors,
included and excluded instruments. $dx1
, $dx2
, $dz1
,
$dz2
store the dimensions of the respective variables. Finally,
$keep
and $keep.names
contain the indices and names of the
regressors that will not be selected over.
2SLS estimator
Description
2SLS estimator that does not allow for weights, offset, other methods than 2SLS, and does not calculate influence statistics. Supposedly faster but returns little output.
Usage
twosls(formula, data)
Arguments
formula |
A formula in the format |
data |
A data frame with all necessary variables y, x, and z. |
Value
twosls()
returns a list with eight names elements:
$coefficients
stores the coefficient estimates of the second stage,
$residuals
the residuals of the structural equation (i.e. using X
and not Xhat), $fitted.values
the fitted values of the second stage,
$n
and $nobs
the sample size, $k
the number of
regressors in the structural equation, $cov.unscaled
the unscaled
variance-covariance matrix, and $sigma
the degrees-of-freedom
adjusted equation standard error.
WARNING
The return value is given class ivreg::ivreg()
but it is not a true
"ivreg"
object. This does not pose any problems for internal use but
should not be used outside of its usage in its current form. The class
assignment is likely to change in the future.
2SLS estimator alternative
Description
Test whether is faster than twosls()
but this does not seem to be
the case.
Usage
twosls.alt(formula, data)
Arguments
formula |
A formula in the format |
data |
A data frame with all necessary variables y, x, and z. |
WARNING
The return value is given class ivreg::ivreg()
but it is not a true
"ivreg"
object. This does not pose any problems for internal use but
should not be used outside of its usage in its current form. The class
assignment is likely to change in the future.