Encoding: | UTF-8 |
Version: | 0.5-7 |
Title: | Useful Tools for Structural Equation Modeling |
Description: | Provides miscellaneous tools for structural equation modeling, many of which extend the 'lavaan' package. For example, latent interactions can be estimated using product indicators (Lin et al., 2010, <doi:10.1080/10705511.2010.488999>) and simple effects probed; analytical power analyses can be conducted (Jak et al., 2021, <doi:10.3758/s13428-020-01479-0>); and scale reliability can be estimated based on estimated factor-model parameters. |
Depends: | R(≥ 4.0), lavaan(≥ 0.6-12), methods |
Imports: | graphics, pbivnorm, stats, utils |
Suggests: | Amelia, blavaan, emmeans, lavaan.mi, MASS, mice, mnormt, parallel, restriktor, testthat, GPArotation |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
LazyData: | yes |
LazyLoad: | yes |
URL: | https://github.com/simsem/semTools/wiki |
BugReports: | https://github.com/simsem/semTools/issues |
Date: | 2025-03-12 |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-03-12 21:15:08 UTC; terrence |
Author: | Terrence D. Jorgensen
|
Maintainer: | Terrence D. Jorgensen <TJorgensen314@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-03-13 08:40:02 UTC |
semTools: Useful Tools for Structural Equation Modeling
Description
The semTools package provides many miscellaneous functions that are useful for statistical analysis involving SEM in R. Many functions extend the funtionality of the lavaan package. Some sets of functions in semTools correspond to the same theme. We call such a collection of functions a suite. Our suites include:
Model Fit Evaluation:
moreFitIndices()
,nullRMSEA()
,singleParamTest()
,miPowerFit()
, andchisqSmallN()
Measurement Invariance:
measEq.syntax()
,partialInvariance()
,partialInvarianceCat()
, andpermuteMeasEq()
Power Analysis:
SSpower()
,findRMSEApower()
,plotRMSEApower()
,plotRMSEAdist()
,findRMSEAsamplesize()
,findRMSEApowernested()
,plotRMSEApowernested()
, andfindRMSEAsamplesizenested()
Missing Data Analysis:
auxiliary()
,twostage()
,fmi()
,bsBootMiss()
,quark()
, andcombinequark()
Latent Interactions:
indProd()
,orthogonalize()
,probe2WayMC()
,probe3WayMC()
,probe2WayRC()
,probe3WayRC()
, andplotProbe()
Exploratory Factor Analysis (EFA):
efa.ekc()
Reliability Estimation:
compRelSEM()
andmaximalRelia()
(see alsoAVE()
)Parceling:
parcelAllocation()
,PAVranking()
, andpoolMAlloc()
Non-Normality:
skew()
,kurtosis()
,mardiaSkew()
,mardiaKurtosis()
, andmvrnonnorm()
All users of R (or SEM) are invited to submit functions or ideas for
functions by contacting the maintainer, Terrence Jorgensen
(TJorgensen314@gmail.com). Contributors are encouraged to use
Roxygen
comments to document their contributed code, which is
consistent with the rest of semTools. Read the vignette from the
roxygen2 package for details:
vignette("rd", package = "roxygen2")
Calculate average variance extracted
Description
Calculate average variance extracted (AVE) per factor from lavaan
object
Usage
AVE(object, obs.var = TRUE, omit.imps = c("no.conv", "no.se"),
omit.factors = character(0), dropSingle = TRUE, return.df = TRUE)
Arguments
object |
A lavaan::lavaan or lavaan.mi::lavaan.mi object,
expected to contain only exogenous common factors (i.e., a CFA model).
Cross-loadings are not allowed and will result in |
obs.var |
|
omit.imps |
|
omit.factors |
|
dropSingle |
|
return.df |
|
Details
The average variance extracted (AVE) can be calculated by
AVE = \frac{\bold{1}^\prime
\textrm{diag}\left(\Lambda\Psi\Lambda^\prime\right)\bold{1}}{\bold{1}^\prime
\textrm{diag}\left(\hat{\Sigma}\right) \bold{1}},
Note that this formula is modified from Fornell & Larcker (1981) in the case that factor variances are not 1. The proposed formula from Fornell & Larcker (1981) assumes that the factor variances are 1. Note that AVE will not be provided for factors consisting of items with dual loadings. AVE is the property of items but not the property of factors. AVE is calculated with polychoric correlations when ordinal indicators are used.
Value
numeric
vector of average variance extracted from indicators
per factor. For models with multiple "blocks" (any combination of groups
and levels), vectors may be returned as columns in a data.frame
with additional columns indicating the group/level (see return.df=
argument description for caveat).
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models with unobservable variables and measurement errors. Journal of Marketing Research, 18(1), 39–50. doi:10.2307/3151312
See Also
compRelSEM()
for composite reliability estimates
Examples
data(HolzingerSwineford1939)
HS9 <- HolzingerSwineford1939[ , c("x7","x8","x9")]
HSbinary <- as.data.frame( lapply(HS9, cut, 2, labels=FALSE) )
names(HSbinary) <- c("y7","y8","y9")
HS <- cbind(HolzingerSwineford1939, HSbinary)
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ y7 + y8 + y9 '
fit <- cfa(HS.model, data = HS, ordered = c("y7","y8","y9"), std.lv = TRUE)
## works for factors with exclusively continuous OR categorical indicators
AVE(fit) # uses observed (or unconstrained polychoric/polyserial) by default
AVE(fit, obs.var = FALSE)
## works for multigroup models and for multilevel models (and both)
data(Demo.twolevel)
## assign clusters to arbitrary groups
Demo.twolevel$g <- ifelse(Demo.twolevel$cluster %% 2L, "type1", "type2")
model2 <- ' group: type1
level: within
fac =~ y1 + L2*y2 + L3*y3
level: between
fac =~ y1 + L2*y2 + L3*y3
group: type2
level: within
fac =~ y1 + L2*y2 + L3*y3
level: between
fac =~ y1 + L2*y2 + L3*y3
'
fit2 <- sem(model2, data = Demo.twolevel, cluster = "cluster", group = "g")
AVE(fit2)
Class For the Results of Bollen-Stine Bootstrap with Incomplete Data
Description
This class contains the results of Bollen-Stine bootstrap with missing data.
Usage
## S4 method for signature 'BootMiss'
show(object)
## S4 method for signature 'BootMiss'
summary(object)
## S4 method for signature 'BootMiss'
hist(x, ..., alpha = 0.05, nd = 2,
printLegend = TRUE, legendArgs = list(x = "topleft"))
Arguments
object , x |
object of class |
... |
Additional arguments to pass to |
alpha |
alpha level used to draw confidence limits |
nd |
number of digits to display |
printLegend |
|
legendArgs |
|
Value
The hist
method returns a list of length == 2
,
containing the arguments for the call to hist
and the arguments
to the call for legend
, respectively.
Slots
time
A list containing 2
difftime
objects (transform
andfit
), indicating the time elapsed for data transformation and for fitting the model to bootstrap data sets, respectively.transData
Transformed data
bootDist
The vector of
chi^2
values from bootstrap data sets fitted by the target modelorigChi
The
chi^2
value from the original data setdf
The degree of freedom of the model
bootP
The p value comparing the original
chi^2
with the bootstrap distribution
Objects from the Class
Objects can be created via the
bsBootMiss()
function.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
See Also
Examples
# See the example from the bsBootMiss function
Class For Rotated Results from EFA
Description
This class contains the results of rotated exploratory factor analysis
Usage
## S4 method for signature 'EFA'
show(object)
## S4 method for signature 'EFA'
summary(object, suppress = 0.1, sort = TRUE)
Arguments
object |
object of class |
suppress |
any standardized loadings less than the specified value will not be printed to the screen |
sort |
|
Slots
loading
Rotated standardized factor loading matrix
rotate
Rotation matrix
gradRotate
gradient of the objective function at the rotated loadings
convergence
Convergence status
phi:
Factor correlation matrix. Will be an identity matrix if orthogonal rotation is used.
se
Standard errors of the rotated standardized factor loading matrix
method
Method of rotation
call
The command used to generate this object
Objects from the Class
Objects can be created via the
orthRotate
or oblqRotate
function.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
See Also
efaUnrotate
; orthRotate
;
oblqRotate
Examples
unrotated <- efaUnrotate(HolzingerSwineford1939, nf = 3,
varList = paste0("x", 1:9), estimator = "mlr")
summary(unrotated, std = TRUE)
lavInspect(unrotated, "std")
# Rotated by Quartimin
rotated <- oblqRotate(unrotated, method = "quartimin")
summary(rotated)
Class For Representing A Template of Model Fit Comparisons
Description
This class contains model fit measures and model fit comparisons among multiple models
Usage
## S4 method for signature 'FitDiff'
show(object)
## S4 method for signature 'FitDiff'
summary(object, fit.measures = "default", nd = 3,
tag = "†")
Arguments
object |
object of class |
fit.measures |
|
nd |
number of digits printed |
tag |
single |
Slots
name
character
. The name of each modelmodel.class
character
. One class to which each model belongsnested
data.frame
. Model fit comparisons between adjacently nested models that are ordered by their degrees of freedom (df)fit
data.frame
. Fit measures of all models specified in thename
slot, ordered by their dffit.diff
data.frame
. Sequential differences in fit measures in thefit
slot
Objects from the Class
Objects can be created via the
compareFit()
function.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Sunthud Pornprasertmanit (psunthud@gmail.com)
See Also
Examples
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit.config <- cfa(HS.model, data = HolzingerSwineford1939, group = "school")
## invariance constraints
fit.metric <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
group.equal = "loadings")
fit.scalar <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
group.equal = c("loadings","intercepts"))
fit.strict <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
group.equal = c("loadings","intercepts","residuals"))
measEqOut <- compareFit(fit.config, fit.metric, fit.scalar, fit.strict)
summary(measEqOut)
summary(measEqOut, fit.measures = "all")
summary(measEqOut, fit.measures = c("aic", "bic"))
if(interactive()){
## Save results to a file
saveFile(measEqOut, file = "measEq.txt")
## Copy to a clipboard
clipboard(measEqOut)
}
Class For the Result of Nesting and Equivalence Testing
Description
This class contains the results of nesting and equivalence testing among multiple models
Usage
## S4 method for signature 'Net'
show(object)
## S4 method for signature 'Net'
summary(object)
Arguments
object |
An object of class |
Value
show |
|
summary |
|
Slots
test
Logical
matrix
indicating nesting/equivalence among modelsdf
The degrees of freedom of tested models
Objects from the Class
Objects can be created via the
net()
function.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
See Also
Examples
# See the example in the net function.
Class for a lavaan Model Fitted to Multiple Imputations
Description
This class extends the lavaan::lavaanList class, created by fitting a lavaan model to a list of data sets. In this case, the list of data sets are multiple imputations of missing data.
Usage
## S4 method for signature 'OLDlavaan.mi'
show(object)
## S4 method for signature 'OLDlavaan.mi'
summary(object, se = TRUE, ci = FALSE,
level = 0.95, standardized = FALSE, rsquare = FALSE, fmi = FALSE,
scale.W = !asymptotic, omit.imps = c("no.conv", "no.se"),
asymptotic = FALSE, header = TRUE, output = "text",
fit.measures = FALSE, ...)
## S4 method for signature 'OLDlavaan.mi'
nobs(object, total = TRUE)
## S4 method for signature 'OLDlavaan.mi'
coef(object, type = "free", labels = TRUE,
omit.imps = c("no.conv", "no.se"))
## S4 method for signature 'OLDlavaan.mi'
vcov(object, type = c("pooled", "between", "within",
"ariv"), scale.W = TRUE, omit.imps = c("no.conv", "no.se"))
## S4 method for signature 'OLDlavaan.mi'
anova(object, ...)
## S4 method for signature 'OLDlavaan.mi'
fitMeasures(object, fit.measures = "all",
baseline.model = NULL, output = "vector", omit.imps = c("no.conv",
"no.se"), ...)
## S4 method for signature 'OLDlavaan.mi'
fitmeasures(object, fit.measures = "all",
baseline.model = NULL, output = "vector", omit.imps = c("no.conv",
"no.se"), ...)
## S4 method for signature 'OLDlavaan.mi'
fitted(object, omit.imps = c("no.conv", "no.se"))
## S4 method for signature 'OLDlavaan.mi'
fitted.values(object, omit.imps = c("no.conv",
"no.se"))
## S4 method for signature 'OLDlavaan.mi'
residuals(object, type = c("raw", "cor"),
omit.imps = c("no.conv", "no.se"))
## S4 method for signature 'OLDlavaan.mi'
resid(object, type = c("raw", "cor"),
omit.imps = c("no.conv", "no.se"))
Arguments
object |
An object of class |
se , ci , level , standardized , rsquare , header , output |
See
|
fmi |
|
scale.W |
|
omit.imps |
|
asymptotic |
|
fit.measures , baseline.model |
See |
... |
Additional arguments passed to |
total |
|
type |
The meaning of this argument varies depending on which method it
it used for. Find detailed descriptions in the Value section
under |
labels |
|
Value
coef |
|
vcov |
|
fitted.values |
|
fitted |
alias for |
residuals |
|
resid |
alias for |
nobs |
|
anova |
|
fitMeasures |
|
fitmeasures |
alias for |
show |
|
summary |
|
Slots
coefList
list
of estimated coefficients in matrix format (one per imputation) as output bylavInspect(fit, "est")
phiList
list
of model-implied latent-variable covariance matrices (one per imputation) as output bylavInspect(fit, "cov.lv")
miList
list
of modification indices output bylavaan::modindices()
seed
integer
seed set before running imputationslavListCall
call to
lavaan::lavaanList()
used to fit the model to the list of imputed data sets in@DataList
, stored as alist
of argumentsimputeCall
call to imputation function (if used), stored as a
list
of argumentsconvergence
list
oflogical
vectors indicating whether, for each imputed data set, (1) the model converged on a solution, (2) SEs could be calculated, (3) the (residual) covariance matrix of latent variables (\Psi
) is non-positive-definite, and (4) the residual covariance matrix of observed variables (\Theta
) is non-positive-definite.lavaanList_slots
All remaining slots are from lavaan::lavaanList, but
runMI()
only populates a subset of thelist
slots, two of them with custom information:DataList
The
list
of imputed data setsSampleStatsList
List of output from
lavInspect(fit, "sampstat")
applied to each fitted modelParTableList
vcovList
testList
h1List
See lavaan::lavaanList. An additional element is added to the
list
:$PT
is the "saturated" model's parameter table, returned bylavaan::lav_partable_unrestricted()
.baselineList
Objects from the Class
See the runMI()
function for details.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Asparouhov, T., & Muthen, B. (2010). Chi-square statistics with multiple imputation. Technical Report. Retrieved from http://www.statmodel.com/
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1(1), 65–92. Retrieved from https://www.jstor.org/stable/24303994
Meng, X.-L., & Rubin, D. B. (1992). Performing likelihood ratio tests with multiply-imputed data sets. Biometrika, 79(1), 103–111. doi:10.2307/2337151
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.
Examples
## See the new lavaan.mi package
Parcel-Allocation Variability in Model Ranking
Description
This function quantifies and assesses the consequences of parcel-allocation
variability for model ranking of structural equation models (SEMs) that
differ in their structural specification but share the same parcel-level
measurement specification (see Sterba & Rights, 2016). This function calls
parcelAllocation()
—which can be used with only one SEM in
isolation—to fit two (assumed) nested models to each of a specified number
of random item-to-parcel allocations. Output includes summary information
about the distribution of model selection results (including plots) and the
distribution of results for each model individually, across allocations
within-sample. Note that this function can be used when selecting among more
than two competing structural models as well (see instructions below
involving the seed=
argument).
Usage
PAVranking(model0, model1, data, parcel.names, item.syntax, nAlloc = 100,
fun = "sem", alpha = 0.05, bic.crit = 10, fit.measures = c("chisq",
"df", "cfi", "tli", "rmsea", "srmr", "logl", "aic", "bic", "bic2"), ...,
show.progress = FALSE, iseed = 12345, warn = FALSE)
Arguments
model0 , model1 |
|
data |
A |
parcel.names |
|
item.syntax |
|
nAlloc |
The number of random items-to-parcels allocations to generate. |
fun |
|
alpha |
Alpha level used as criterion for significance. |
bic.crit |
Criterion for assessing evidence in favor of one model over another. See Rafferty (1995) for guidelines (default is "very strong evidence" in favor of the model with lower BIC). |
fit.measures |
|
... |
Additional arguments to be passed to
|
show.progress |
If |
iseed |
(Optional) Random seed used for parceling items. When the same
random seed is specified and the program is re-run, the same allocations
will be generated. The seed argument can be used to assess parcel-allocation
variability in model ranking when considering more than two models. For each
pair of models under comparison, the program should be rerun using the same
random seed. Doing so ensures that multiple model comparisons will employ
the same set of parcel datasets. Note: When using parallel
options, you must first type |
warn |
Whether to print warnings when fitting models to each allocation |
Details
This is based on a SAS macro ParcelAlloc
(Sterba & MacCallum, 2010).
The PAVranking()
function produces results discussed in Sterba and
Rights (2016) relevant to the assessment of parcel-allocation variability in
model selection and model ranking. Specifically, the PAVranking()
function first calls parcelAllocation()
to generate a given
number (nAlloc=
) of item-to-parcel allocations, fitting both specified
models to each allocation, and providing summaryies of PAV for each model.
Additionally, PAVranking()
provides the following new summaries:
PAV in model selection index values and model ranking between Models
model0=
andmodel1=
.The proportion of allocations that converged and the proportion of proper solutions (results are summarized for allocations with both converged and proper allocations only).
For further details on the benefits of the random allocation of items to parcels, see Sterba (2011) and Sterba and MacCallum (2010).
To test whether nested models have equivalent fit, results can be pooled across allocations using the same methods available for pooling results across multiple imputations of missing data (see Examples).
Note: This function requires the lavaan
package. Missing data
must be coded as NA
. If the function returns "Error in plot.new() : figure margins too large"
, the user may need to increase
size of the plot window (e.g., in RStudio) and rerun the function.
Value
A list
with 3 elements. The first two (model0.results
and
model1.results
) are results returned by parcelAllocation()
for model0
and model1
, respectively.
The third element (model0.v.model1
) is a list
of
model-comparison results, including the following:
\verb{LRT_Summary:} |
The average likelihood ratio test across allocations, as well as the SD, minimum, maximum, range, and the proportion of allocations for which the test was significant. |
\verb{Fit_Index_Differences:} |
Differences in fit indices, organized by what proportion favored each model and among those, what the average difference was. |
\verb{Favored_by_BIC:} |
The proportion of allocations in which each
model met the criterion ( |
\verb{Convergence_Summary:} |
The proportion of allocations in which each model (and both models) converged on a solution. |
Histograms are also printed to the current plot-output device.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology, 25, 111–163. doi:10.2307/271063
Sterba, S. K. (2011). Implications of parcel-allocation variability for comparing fit of item-solutions and parcel-solutions. Structural Equation Modeling, 18(4), 554–577.doi:10.1080/10705511.2011.607073
Sterba, S. K., & MacCallum, R. C. (2010). Variability in parameter estimates and model fit across repeated allocations of items to parcels. Multivariate Behavioral Research, 45(2), 322–358. doi:10.1080/00273171003680302
Sterba, S. K., & Rights, J. D. (2016). Accounting for parcel-allocation variability in practice: Combining sources of uncertainty and choosing the number of allocations. Multivariate Behavioral Research, 51(2–3), 296–313. doi:10.1080/00273171.2016.1144502
Sterba, S. K., & Rights, J. D. (2017). Effects of parceling on model selection: Parcel-allocation variability in model ranking. Psychological Methods, 22(1), 47–68. doi:10.1037/met0000067
See Also
parcelAllocation()
for fitting a single model,
poolMAlloc()
for choosing the number of allocations
Examples
## Specify the item-level model (if NO parcels were created)
## This must apply to BOTH competing models
item.syntax <- c(paste0("f1 =~ f1item", 1:9),
paste0("f2 =~ f2item", 1:9))
cat(item.syntax, sep = "\n")
## Below, we reduce the size of this same model by
## applying different parceling schemes
## Specify a 2-factor CFA with correlated factors, using 3-indicator parcels
mod1 <- '
f1 =~ par1 + par2 + par3
f2 =~ par4 + par5 + par6
'
## Specify a more restricted model with orthogonal factors
mod0 <- '
f1 =~ par1 + par2 + par3
f2 =~ par4 + par5 + par6
f1 ~~ 0*f2
'
## names of parcels (must apply to BOTH models)
(parcel.names <- paste0("par", 1:6))
## override default random-number generator to use parallel options
RNGkind("L'Ecuyer-CMRG")
PAVranking(model0 = mod0, model1 = mod1, data = simParcel, nAlloc = 100,
parcel.names = parcel.names, item.syntax = item.syntax,
# parallel = "multicore", # parallel available on Mac/Linux
std.lv = TRUE) # any addition lavaan arguments
## POOL RESULTS by treating parcel allocations as multiple imputations.
## Details provided in Sterba & Rights (2016); see ?poolMAlloc.
## save list of data sets instead of fitting model yet
dataList <- parcelAllocation(mod0, # or mod1 (either uses same allocations)
data = simParcel, nAlloc = 100,
parcel.names = parcel.names,
item.syntax = item.syntax,
do.fit = FALSE)
## now fit each model to each data set
if(requireNamespace("lavaan.mi")){
library(lavaan.mi)
fit0 <- cfa.mi(mod0, data = dataList, std.lv = TRUE)
fit1 <- cfa.mi(mod1, data = dataList, std.lv = TRUE)
anova(fit0, fit1) # Pooled test statistic comparing models.
help(package = "lavaan.mi") # Find more methods for pooling results.
}
Power for model parameters
Description
Apply Satorra & Saris (1985) method for chi-squared power analysis.
Usage
SSpower(powerModel, n, nparam, popModel, mu, Sigma, fun = "sem",
alpha = 0.05, ...)
Arguments
powerModel |
lavaan |
n |
|
nparam |
|
popModel |
lavaan |
mu |
|
Sigma |
|
fun |
character. Name of |
alpha |
Type I error rate used to set a criterion for rejecting H0. |
... |
additional arguments to pass to |
Details
Specify all non-zero parameters in a population model, either by using
lavaan syntax (popModel
) or by submitting a population covariance
matrix (Sigma
) and optional mean vector (mu
) implied by the
population model. Then specify an analysis model that places at least
one invalid constraint (note the number in the nparam
argument).
There is also a Shiny app called "power4SEM" that provides a graphical user interface for this functionality (Jak et al., in press). It can be accessed at https://sjak.shinyapps.io/power4SEM/.
Author(s)
Alexander M. Schoemann (East Carolina University; schoemanna@ecu.edu)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Satorra, A., & Saris, W. E. (1985). Power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50(1), 83–90. doi:10.1007/BF02294150
Jak, S., Jorgensen, T. D., Verdam, M. G., Oort, F. J., & Elffers, L. (2021). Analytical power calculations for structural equation modeling: A tutorial and Shiny app. Behavior Research Methods, 53, 1385–1406. doi:10.3758/s13428-020-01479-0
Examples
## Specify population values. Note every parameter has a fixed value.
modelP <- '
f1 =~ .7*V1 + .7*V2 + .7*V3 + .7*V4
f2 =~ .7*V5 + .7*V6 + .7*V7 + .7*V8
f1 ~~ .3*f2
f1 ~~ 1*f1
f2 ~~ 1*f2
V1 ~~ .51*V1
V2 ~~ .51*V2
V3 ~~ .51*V3
V4 ~~ .51*V4
V5 ~~ .51*V5
V6 ~~ .51*V6
V7 ~~ .51*V7
V8 ~~ .51*V8
'
## Specify analysis model. Note parameter of interest f1~~f2 is fixed to 0.
modelA <- '
f1 =~ V1 + V2 + V3 + V4
f2 =~ V5 + V6 + V7 + V8
f1 ~~ 0*f2
'
## Calculate power
SSpower(powerModel = modelA, popModel = modelP, n = 150, nparam = 1,
std.lv = TRUE)
## Get power for a range of sample sizes
Ns <- seq(100, 500, 40)
Power <- rep(NA, length(Ns))
for(i in 1:length(Ns)) {
Power[i] <- SSpower(powerModel = modelA, popModel = modelP,
n = Ns[i], nparam = 1, std.lv = TRUE)
}
plot(x = Ns, y = Power, type = "l", xlab = "Sample Size")
## Optionally specify different values for multiple populations
modelP2 <- '
f1 =~ .7*V1 + .7*V2 + .7*V3 + .7*V4
f2 =~ .7*V5 + .7*V6 + .7*V7 + .7*V8
f1 ~~ c(-.3, .3)*f2 # DIFFERENT ACROSS GROUPS
f1 ~~ 1*f1
f2 ~~ 1*f2
V1 ~~ .51*V1
V2 ~~ .51*V2
V3 ~~ .51*V3
V4 ~~ .51*V4
V5 ~~ .51*V5
V6 ~~ .51*V6
V7 ~~ .51*V7
V8 ~~ .51*V8
'
modelA2 <- '
f1 =~ V1 + V2 + V3 + V4
f2 =~ V5 + V6 + V7 + V8
f1 ~~ c(psi21, psi21)*f2 # EQUALITY CONSTRAINT ACROSS GROUPS
'
## Calculate power
SSpower(powerModel = modelA2, popModel = modelP2, n = c(100, 100), nparam = 1,
std.lv = TRUE)
## Get power for a range of sample sizes
Ns2 <- cbind(Group1 = seq(10, 100, 10), Group2 = seq(10, 100, 10))
Power2 <- apply(Ns2, MARGIN = 1, FUN = function(nn) {
SSpower(powerModel = modelA2, popModel = modelP2, n = nn,
nparam = 1, std.lv = TRUE)
})
plot(x = rowSums(Ns2), y = Power2, type = "l", xlab = "Total Sample Size",
ylim = 0:1)
abline(h = c(.8, .9), lty = c("dotted","dashed"))
legend("bottomright", c("80% Power","90% Power"), lty = c("dotted","dashed"))
Implement Saturated Correlates with FIML
Description
Automatically add auxiliary variables to a lavaan model when using full information maximum likelihood (FIML) to handle missing data
Usage
auxiliary(model, data, aux, fun, ..., envir = getNamespace("lavaan"),
return.syntax = FALSE)
lavaan.auxiliary(model, data, aux, ..., envir = getNamespace("lavaan"))
cfa.auxiliary(model, data, aux, ..., envir = getNamespace("lavaan"))
sem.auxiliary(model, data, aux, ..., envir = getNamespace("lavaan"))
growth.auxiliary(model, data, aux, ..., envir = getNamespace("lavaan"))
Arguments
model |
The analysis model can be specified with 1 of 2 objects:
|
data |
|
aux |
|
fun |
|
... |
Additional arguments to pass to |
envir |
Passed to |
return.syntax |
|
Details
These functions are wrappers around the corresponding lavaan functions.
You can use them the same way you use lavaan::lavaan()
, but you
must pass your full data.frame
to the data
argument.
Because the saturated-correlates approaches (Enders, 2008) treats exogenous
variables as random, fixed.x
must be set to FALSE
. Because FIML
requires continuous data (although nonnormality corrections can still be
requested), no variables in the model nor auxiliary variables specified in
aux
can be declared as ordered
.
Value
a fitted lavaan::lavaan object. Additional
information is stored as a list
in the @external
slot:
-
baseline.model
. a fitted lavaan::lavaan object. Results of fitting an appropriate independence model for the calculation of incremental fit indices (e.g., CFI, TLI) in which the auxiliary variables remain saturated, so only the target variables are constrained to be orthogonal. See Examples for how to send this baseline model tolavaan::fitMeasures()
. -
aux
. The character vector of auxiliary variable names. -
baseline.syntax
. A character vector generated within theauxiliary
function, specifying thebaseline.model
syntax.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Enders, C. K. (2008). A note on the use of missing auxiliary variables in full information maximum likelihood-based structural equation models. Structural Equation Modeling, 15(3), 434–448. doi:10.1080/10705510802154307
Examples
dat1 <- lavaan::HolzingerSwineford1939
set.seed(12345)
dat1$z <- rnorm(nrow(dat1))
dat1$x5 <- ifelse(dat1$z < quantile(dat1$z, .3), NA, dat1$x5)
dat1$x9 <- ifelse(dat1$z > quantile(dat1$z, .8), NA, dat1$x9)
targetModel <- "
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
"
## works just like cfa(), but with an extra "aux" argument
fitaux1 <- cfa.auxiliary(targetModel, data = dat1, aux = "z",
missing = "fiml", estimator = "mlr")
## with multiple auxiliary variables and multiple groups
fitaux2 <- cfa.auxiliary(targetModel, data = dat1, aux = c("z","ageyr","grade"),
group = "school", group.equal = "loadings")
## calculate correct incremental fit indices (e.g., CFI, TLI)
fitMeasures(fitaux2, fit.measures = c("cfi","tli"))
## NOTE: lavaan will use the internally stored baseline model, which
## is the independence model plus saturated auxiliary parameters
lavInspect(fitaux2@external$baseline.model, "free")
Bollen-Stine Bootstrap with the Existence of Missing Data
Description
Implement the Bollen and Stine's (1992) Bootstrap when missing observations
exist. The implemented method is proposed by Savalei and Yuan (2009). This
can be used in two ways. The first and easiest option is to fit the model to
incomplete data in lavaan
using the FIML estimator, then pass that
lavaan
object to bsBootMiss
.
Usage
bsBootMiss(x, transformation = 2, nBoot = 500, model, rawData, Sigma, Mu,
group, ChiSquared, EMcov, writeTransData = FALSE, transDataOnly = FALSE,
writeBootData = FALSE, bootSamplesOnly = FALSE, writeArgs, seed = NULL,
suppressWarn = TRUE, showProgress = TRUE, ...)
Arguments
x |
A target |
transformation |
The transformation methods in Savalei and Yuan (2009).
There are three methods in the article, but only the first two are currently
implemented here. Use |
nBoot |
The number of bootstrap samples. |
model |
Optional. The target model if |
rawData |
Optional. The target raw data set if |
Sigma |
Optional. The model-implied covariance matrix if |
Mu |
Optional. The model-implied mean vector if |
group |
Optional character string specifying the name of the grouping
variable in |
ChiSquared |
Optional. The model's |
EMcov |
Optional, if |
writeTransData |
Logical. If |
transDataOnly |
Logical. If |
writeBootData |
Logical. If |
bootSamplesOnly |
Logical. If |
writeArgs |
Optional |
seed |
The seed number used in randomly drawing bootstrap samples. |
suppressWarn |
Logical. If |
showProgress |
Logical. Indicating whether to display a progress bar while fitting models to bootstrap samples. |
... |
The additional arguments in the |
Details
The second is designed for users of other software packages (e.g., LISREL,
EQS, Amos, or Mplus). Users can import their data, \chi^2
value, and
model-implied moments from another package, and they have the option of
saving (or writing to a file) either the transformed data or bootstrapped
samples of that data, which can be analyzed in other programs. In order to
analyze the bootstrapped samples and return a p value, users of other
programs must still specify their model using lavaan syntax.
Value
As a default, this function returns a BootMiss
object containing the results of the bootstrap samples. Use show
,
summary
, or hist
to examine the results. Optionally, the
transformed data set is returned if transDataOnly = TRUE
. Optionally,
the bootstrap data sets are returned if bootSamplesOnly = TRUE
.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Syntax for transformations borrowed from http://www2.psych.ubc.ca/~vsavalei/
References
Bollen, K. A., & Stine, R. A. (1992). Bootstrapping goodness-of-fit measures in structural equation models. Sociological Methods & Research, 21(2), 205–229. doi:10.1177/0049124192021002004
Savalei, V., & Yuan, K.-H. (2009). On the model-based bootstrap with missing data: Obtaining a p-value for a test of exact fit. Multivariate Behavioral Research, 44(6), 741–763. doi:10.1080/00273170903333590
See Also
Examples
dat1 <- HolzingerSwineford1939
dat1$x5 <- ifelse(dat1$x1 <= quantile(dat1$x1, .3), NA, dat1$x5)
dat1$x9 <- ifelse(is.na(dat1$x5), NA, dat1$x9)
targetModel <- "
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
"
targetFit <- sem(targetModel, dat1, meanstructure = TRUE, std.lv = TRUE,
missing = "fiml", group = "school")
summary(targetFit, fit = TRUE, standardized = TRUE)
## The number of bootstrap samples should be much higher than this example
temp <- bsBootMiss(targetFit, transformation = 1, nBoot = 10, seed = 31415)
temp
summary(temp)
hist(temp)
hist(temp, printLegend = FALSE) # suppress the legend
## user can specify alpha level (default: alpha = 0.05), and the number of
## digits to display (default: nd = 2). Pass other arguments to hist(...),
## or a list of arguments to legend() via "legendArgs"
hist(temp, alpha = .01, nd = 3, xlab = "something else", breaks = 25,
legendArgs = list("bottomleft", box.lty = 2))
Calculate the "D2" statistic
Description
This is a utility function used to calculate the "D2" statistic for pooling
test statistics across multiple imputations. This function is called by
several functions used for OLDlavaan.mi objects, such as
lavTestLRT.mi()
, lavTestWald.mi()
, and
lavTestScore.mi()
. But this function can be used for any general
scenario because it only requires a vector of \chi^2
statistics (one
from each imputation) and the degrees of freedom for the test statistic.
See Li, Meng, Raghunathan, & Rubin (1991) and Enders (2010, chapter 8) for
details about how it is calculated.
Usage
calculate.D2(w, DF = 0L, asymptotic = FALSE)
Arguments
w |
|
DF |
degrees of freedom (df) of the |
asymptotic |
|
Value
A numeric
vector containing the test statistic, df,
its p value, and 2 missing-data diagnostics: the relative invrease
in variance (RIV, or average for multiparameter tests: ARIV) and the
fraction missing information (FMI = ARIV / (1 + ARIV)).
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1(1), 65–92. Retrieved from https://www.jstor.org/stable/24303994
See Also
lavTestLRT.mi()
, lavTestWald.mi()
, lavTestScore.mi()
Examples
## generate a vector of chi-squared values, just for example
DF <- 3 # degrees of freedom
M <- 20 # number of imputations
CHI <- rchisq(M, DF)
## pool the "results"
calculate.D2(CHI, DF) # by default, an F statistic is returned
calculate.D2(CHI, DF, asymptotic = TRUE) # asymptotically chi-squared
## generate standard-normal values, for an example of Wald z tests
Z <- rnorm(M)
calculate.D2(Z) # default DF = 0 will square Z to make chisq(DF = 1)
## F test is equivalent to a t test with the denominator DF
Small-N correction for chi^2
test statistic
Description
Calculate small-N corrections for chi^2
model-fit test
statistic to adjust for small sample size (relative to model size).
Usage
chisqSmallN(fit0, fit1 = NULL, smallN.method = if (is.null(fit1))
c("swain", "yuan.2015") else "yuan.2005", ..., omit.imps = c("no.conv",
"no.se"))
Arguments
fit0 , fit1 |
lavaan::lavaan or lavaan.mi::lavaan.mi object(s) |
smallN.method |
|
... |
Additional arguments to the |
omit.imps |
|
Details
Four finite-sample adjustments to the chi-squared statistic are currently available, all of which are described in Shi et al. (2018). These all assume normally distributed data, and may not work well with severely nonnormal data. Deng et al. (2018, section 4) review proposed small-N adjustments that do not assume normality, which rarely show promise, so they are not implemented here. This function currently will apply small-N adjustments to scaled test statistics with a warning that they do not perform well (Deng et al., 2018).
Value
A list
of numeric
vectors: one for the originally
requested statistic(s), along with one per requested smallN.method
.
All include the the (un)adjusted test statistic, its df, and the
p value for the test under the null hypothesis that the model fits
perfectly (or that the 2 models have equivalent fit).
The adjusted chi-squared statistic(s) also include(s) the scaling factor
for the small-N adjustment.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Deng, L., Yang, M., & Marcoulides, K. M. (2018). Structural equation modeling with many variables: A systematic review of issues and developments. Frontiers in Psychology, 9, 580. doi:10.3389/fpsyg.2018.00580
Shi, D., Lee, T., & Terry, R. A. (2018). Revisiting the model size effect in structural equation modeling. Structural Equation Modeling, 25(1), 21–40. doi:10.1080/10705511.2017.1369088
Examples
HS.model <- '
visual =~ x1 + b1*x2 + x3
textual =~ x4 + b2*x5 + x6
speed =~ x7 + b3*x8 + x9
'
fit1 <- cfa(HS.model, data = HolzingerSwineford1939[1:50,])
## test a single model (implicitly compared to a saturated model)
chisqSmallN(fit1)
## fit a more constrained model
fit0 <- cfa(HS.model, data = HolzingerSwineford1939[1:50,],
orthogonal = TRUE)
## compare 2 models
chisqSmallN(fit1, fit0)
Copy or save the result of lavaan
or FitDiff
objects into a
clipboard or a file
Description
Copy or save the result of lavaan
or FitDiff
object into a clipboard or a file. From the clipboard, users may paste the
result into the Microsoft Excel or spreadsheet application to create a table
of the output.
Usage
clipboard(object, what = "summary", ...)
saveFile(object, file, what = "summary", tableFormat = FALSE,
fit.measures = "default", writeArgs = list(), ...)
Arguments
object |
An object of class lavaan::lavaan or FitDiff. |
what |
The attributes of the |
... |
Additional arguments when passing a |
file |
A file name used for saving the result. |
tableFormat |
If |
fit.measures |
|
writeArgs |
|
Value
The resulting output will be saved into a clipboard or a file. If
using the clipboard
function, users may paste it in the other
applications.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Examples
library(lavaan)
HW.model <- ' visual =~ x1 + c1*x2 + x3
textual =~ x4 + c1*x5 + x6
speed =~ x7 + x8 + x9 '
fit <- cfa(HW.model, data = HolzingerSwineford1939, group = "school")
if(interactive()){
# Copy the summary of the lavaan object
clipboard(fit)
# pass additional arguments to summary() method for class?lavaan
clipboard(fit, rsquare = TRUE, standardized = TRUE, fit.measures = TRUE)
# Copy modification indices and fit stats from the miPowerFit() function
clipboard(fit, "mifit")
# Copy the parameter estimates
clipboard(fit, "coef")
# Copy the standard errors
clipboard(fit, "se")
# Copy the sample statistics
clipboard(fit, "samp")
# Copy the fit measures
clipboard(fit, "fit")
# Save the summary of the lavaan object
saveFile(fit, "out.txt")
# Save modification indices and fit stats from the miPowerFit() function
saveFile(fit, "out.txt", "mifit")
# Save the parameter estimates
saveFile(fit, "out.txt", "coef")
# Save the standard errors
saveFile(fit, "out.txt", "se")
# Save the sample statistics
saveFile(fit, "out.txt", "samp")
# Save the fit measures
saveFile(fit, "out.txt", "fit")
}
Combine the results from the quark function
Description
This function builds upon the quark()
function to provide a
final dataset comprised of the original dataset provided to
quark()
and enough principal components to be able to account
for a certain level of variance in the data.
Usage
combinequark(quark, percent)
Arguments
quark |
Provide the |
percent |
Provide a percentage of variance that you would like to have explained. That many components (columns) will be extracted and kept with the output dataset. Enter this variable as a number WITHOUT a percentage sign. |
Value
The output of this function is the original dataset used in quark combined with enough principal component scores to be able to account for the amount of variance that was requested.
Author(s)
Steven R. Chesnut (University of Southern Mississippi Steven.Chesnut@usm.edu)
See Also
Examples
set.seed(123321)
dat <- HolzingerSwineford1939[,7:15]
misspat <- matrix(runif(nrow(dat) * 9) < 0.3, nrow(dat))
dat[misspat] <- NA
dat <- cbind(HolzingerSwineford1939[,1:3], dat)
quark.list <- quark(data = dat, id = c(1, 2))
final.data <- combinequark(quark = quark.list, percent = 80)
Composite Reliability using SEM
Description
Calculate composite reliability from estimated factor-model parameters
Usage
compRelSEM(object, obs.var = TRUE, tau.eq = FALSE, ord.scale = TRUE,
config = character(0), shared = character(0), higher = character(0),
return.total = FALSE, dropSingle = TRUE, omit.factors = character(0),
omit.indicators = character(0), omit.imps = c("no.conv", "no.se"),
return.df = TRUE)
Arguments
object |
A lavaan::lavaan or lavaan.mi::lavaan.mi object, expected to contain only exogenous common factors (i.e., a CFA model). |
obs.var |
|
tau.eq |
|
ord.scale |
|
config |
|
shared |
|
higher |
|
return.total |
|
dropSingle |
|
omit.factors |
|
omit.indicators |
|
omit.imps |
|
return.df |
|
Details
Several coefficients for factor-analysis reliability have been termed
"omega", which Cho (2021) argues is a misleading misnomer and argues for
using \rho
to represent them all, differentiated by descriptive
subscripts. In our package, we strive to provide unlabeled coefficients,
leaving it to the user to decide on a label in their report. But we do
use the symbols \alpha
and \omega
in the formulas below in order
to distinguish coefficients that do (not) assume essential tau-equivalence.
For higher-order constructs with latent indicators, only \omega
is
available. Lai's (2021) multilevel coefficients are labeled in accordance
with the symbols used in that article (more details below).
Bentler (1968) first introduced factor-analysis reliability for a
unidimensional factor model with congeneric indicators, labeling the
coeficients \alpha
. McDonald (1999) later referred to this
and other reliability coefficients, first as \theta
(in 1970),
then as \omega
, which is a source of confusion when reporting
coefficients (Cho, 2021). Coefficients based on factor models were later
generalized to account for multidimenisionality (possibly with
cross-loadings) and correlated errors. The general \omega
formula
implemented in this function is:
\omega = \frac{\left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
Var\left( \psi \right)}{\bold{1}^\prime \hat{\Sigma} \bold{1}},
where \hat{\Sigma}
can be the model-implied covariance matrix from
either the saturated model (i.e., the "observed" covariance matrix, used by
default) or from the hypothesized CFA model, controlled by the
obs.var
argument. A k
-dimensional vector \bold{1}
is used
to sum elements in the matrix. Note that if the model includes any directed
effects (latent regression slopes), all coefficients are calculated
from total factor variances: lavInspect(object, "cov.lv")
.
Assuming (essential) tau-equivalence (tau.eq=TRUE
) makes \omega
equivalent to coefficient \alpha
from classical test theory
(Cronbach, 1951):
\alpha = \frac{k}{k - 1}\left[ 1 - \frac{\sum^{k}_{i = 1}
\sigma_{ii}}{\sum^{k}_{i = 1} \sigma_{ii} + 2\sum_{i < j} \sigma_{ij}}
\right],
where k
is the number of items in a factor's composite,
\sigma_{ii}
signifies item i's variance, and \sigma_{ij}
signifies the covariance between items i and j. Again, the
obs.var
argument controls whether \alpha
is calculated using
the observed or model-implied covariance matrix.
By setting return.total=TRUE
, one can estimate reliability for a
single composite calculated using all indicators in a multidimensional
CFA (Bentler, 1972, 2009). Setting return.total = -1
will return
only the total-composite reliability (not per factor).
Higher-Order Factors:
The reliability of a composite that represents a higher-order construct
requires partitioning the model-implied factor covariance matrix \Phi
in order to isolate the common-factor variance associated only with the
higher-order factor. Using a second-order factor model, the model-implied
covariance matrix of observed indicators \hat{\Sigma}
can be
partitioned into 3 sources:
the second-order common-factor (co)variance:
\Lambda \bold{B} \Phi_2 \bold{B}^{\prime} \Lambda^{\prime}
the residual variance of the first-order common factors (i.e., not accounted for by the second-order factor):
\Lambda \Psi_{u} \Lambda^{\prime}
the measurement error of observed indicators:
\Theta
where \Lambda
contains first-order factor loadings, \bold{B}
contains second-order factor loadings, \Phi_2
is the model-implied
covariance matrix of the second-order factor(s), and \Psi_{u}
is the
covariance matrix of first-order factor disturbances. In practice, we can
use the full \bold{B}
matrix and full model-implied \Phi
matrix
(i.e., including all latent factors) because the zeros in \bold{B}
will cancel out unwanted components of \Phi
. Thus, we can calculate
the proportion of variance of a composite score calculated from the observed
indicators (e.g., a total score or scale mean) that is attributable to the
second-order factor (i.e., coefficient \omega
):
\omega=\frac{\bold{1}^{\prime} \Lambda \bold{B} \Phi \bold{B}^{\prime}
\Lambda^{\prime} \bold{1} }{ \bold{1}^{\prime} \hat{\Sigma} \bold{1}},
where \bold{1}
is the k-dimensional vector of 1s and k
is the number of observed indicators in the composite. Note that if a
higher-order factor also has observed indicators, it is necessary to model
the observed indicators as single-indicator constructs, so that all of the
higher-order factor indicators are latent (with loadings in the Beta matrix,
not Lambda).
Categorical Indicators:
When all indicators (per composite) are ordinal, the ord.scale
argument controls whether the coefficient is calculated on the
latent-response scale (FALSE
) or on the observed ordinal scale
(TRUE
, the default). For \omega
-type coefficients
(tau.eq=FALSE
), Green and Yang's (2009, formula 21) approach is used
to transform factor-model results back to the ordinal response scale. When
ord.scale=TRUE
and tau.eq=TRUE
, coefficient \alpha
is
calculated using the covariance matrix calculated from the integer-valued
numeric weights for ordinal categories, consistent with its definition
(Chalmers, 2018) and the alpha
function in the psych
package;
this implies obs.var=TRUE
, so obs.var=FALSE
will be ignored
When ord.scale=FALSE
, the standard \alpha
formula is applied to
the polychoric correlation matrix ("ordinal \alpha
"; Zumbo et al., 2007),
estimated from the saturated or hypothesized model (see obs.var
),
and \omega
is calculated from CFA results without applying Green and
Yang's (2009) correction (see Zumbo & Kroc, 2019, for a rationalization).
No method analogous to Green and Yang (2009) has been proposed for
calculating reliability with a mixture of categorical and continuous
indicators, so an error is returned if object
includes factors with a
mixture of indicator types (unless omitted using omit.factors
). If
categorical indicators load on a different factor(s) than continuous
indicators, then reliability will still be calculated separately for those
factors, but return.total
must be FALSE
(unless
omit.factors
is used to isolate factors with indicators of the same
type).
Multilevel Measurement Models:
Under the default settings, compRelSEM()
will apply the same formula
in each "block" (group and/or level of analysis). In the case of multilevel
(ML-)SEMs, this yields "reliability" for latent within- and between-level
components, as proposed by Geldhof et al. (2014). Although this works fine
to calculate reliability per group, this is not recommended for ML-SEMs
because the coefficients do not correspond to actual composites that would
be calculated from the observed data. Lai (2021) proposed coefficients for
reliability of actual composites, depending on the type of construct, which
requires specifying the names of constructs for which reliability is desired
(or multiple constructs whose indicators would compose a multidimensional
composite). Configural (config=
) and/or shared=
constructs
can be specified; the same construct can be specified in both arguments, so
that overall scale-reliability can be estimated for a shared construct by
including it in config
. Instead of organizing the output by block
(the default), specifying config=
and/or shared=
will prompt
organizing the list of output by $config
and/or $shared
.
The overall (
_2L
) scale reliability forconfig
ural constructs is returned, along with the reliability of a purely individual-level composite (_W
, calculated by cluster-mean centering).The reliability for a
shared
construct quantifies generalizability across both indicators and raters (i.e., subjects rating their cluster's construct). Lüdtke et al. (2011) refer to these as measurement error and sampling error, respectively. An interrater reliability (IRR) coefficient is also returned, quantifying generalizability across rater/sampling-error only. To obtain a scale-reliability coefficient (quantifying a shared construct's generalizability across indicator/measurement-error only), include the same factor name inconfig=
. Jak et al. (2021) recommended modeling components of the same construct at both levels, but users may also saturate the within-level model (Lai, 2021).
Be careful about including Level-2 variables in the model, especially
whether it makes sense to include them in a total composite for a Level-2
construct. dropSingle=TRUE
only prevents estimating reliability for
a single-indicator construct, not from including such an indicator in a
total composite. It is permissible for shared=
constructs to have
additional indicators at Level-2 only. If it is necessary to model other
Level-2 variables (e.g., to justify the missing-at-random assumption when
using missing="FIML" estimation
), they should be placed in the
omit.indicators=
argument to exclude them from total composites.
Value
A numeric
vector of composite reliability coefficients per
factor, or a list
of vectors per "block" (group and/or level of
analysis), optionally returned as a data.frame
when possible (see
return.df=
argument description for caveat). If there are multiple
factors, whose multidimensional indicators combine into a single
composite, users can request return.total=TRUE
to add a column
including a reliability coefficient for the total composite, or
return.total = -1
to return only the total-composite
reliability (ignored when config=
or shared=
is specified
because each factor's specification must be checked across levels).
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Uses hidden functions written by Sunthud Pornprasertmanit
(psunthud@gmail.com) for the old reliability()
function.
References
Bentler, P. M. (1968). Alpha-maximized factor analysis (alphamax): Its relation to alpha and canonical factor analysis. Psychometrika, 33(3), 335–345. doi:10.1007/BF02289328
Bentler, P. M. (1972). A lower-bound method for the dimension-free measurement of internal consistency. Social Science Research, 1(4), 343–357. doi:10.1016/0049-089X(72)90082-8
Bentler, P. M. (2009). Alpha, dimension-free, and model-based internal consistency reliability. Psychometrika, 74(1), 137–143. doi:10.1007/s11336-008-9100-1
Chalmers, R. P. (2018). On misconceptions and the limited usefulness of ordinal alpha. Educational and Psychological Measurement, 78(6), 1056–1071. doi:10.1177/0013164417727036
Cho, E. (2021) Neither Cronbach’s alpha nor McDonald’s omega: A commentary on Sijtsma and Pfadt. Psychometrika, 86(4), 877–886. doi:10.1007/s11336-021-09801-1
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. doi:10.1007/BF02310555
Geldhof, G. J., Preacher, K. J., & Zyphur, M. J. (2014). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological Methods, 19(1), 72–91. doi:10.1037/a0032138
Green, S. B., & Yang, Y. (2009). Reliability of summed item scores using structural equation modeling: An alternative to coefficient alpha. Psychometrika, 74(1), 155–167. doi:10.1007/s11336-008-9099-3
Jak, S., Jorgensen, T. D., & Rosseel, Y. (2021). Evaluating cluster-level
factor models with lavaan
and Mplus. Psych, 3(2),
134–152. doi:10.3390/psych3020012
Lai, M. H. C. (2021). Composite reliability of multilevel data: It’s about observed scores and construct meanings. Psychological Methods, 26(1), 90–102. doi:10.1037/met0000287
Lüdtke, O., Marsh, H. W., Robitzsch, A., & Trautwein, U. (2011).
A 2 \times
2 taxonomy of multilevel latent contextual models:
Accuracy–bias trade-offs in full and partial error correction models.
Psychological Methods, 16(4), 444–467. doi:10.1037/a0024376
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.
Zumbo, B. D., Gadermann, A. M., & Zeisser, C. (2007). Ordinal versions of coefficients alpha and theta for Likert rating scales. Journal of Modern Applied Statistical Methods, 6(1), 21–29. doi:10.22237/jmasm/1177992180
Zumbo, B. D., & Kroc, E. (2019). A measurement is a choice and Stevens’ scales of measurement do not help make it: A response to Chalmers. Educational and Psychological Measurement, 79(6), 1184–1197. doi:10.1177/0013164419844305
See Also
maximalRelia()
for the maximal reliability of weighted composite
Examples
data(HolzingerSwineford1939)
HS9 <- HolzingerSwineford1939[ , c("x7","x8","x9")]
HSbinary <- as.data.frame( lapply(HS9, cut, 2, labels=FALSE) )
names(HSbinary) <- c("y7","y8","y9")
HS <- cbind(HolzingerSwineford1939, HSbinary)
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ y7 + y8 + y9 '
fit <- cfa(HS.model, data = HS, ordered = c("y7","y8","y9"), std.lv = TRUE)
## works for factors with exclusively continuous OR categorical indicators
compRelSEM(fit)
## reliability for ALL indicators only available when they are
## all continuous or all categorical
compRelSEM(fit, omit.factors = "speed", return.total = TRUE)
## loop over visual indicators to calculate alpha if one indicator is removed
for (i in paste0("x", 1:3)) {
cat("Drop ", i, ":\n", sep = "")
print(compRelSEM(fit, omit.factors = c("textual","speed"),
omit.indicators = i, tau.eq = TRUE))
}
## item-total correlations obtainable by adding a composite to the data
HS$Visual <- HS$x1 + HS$x2 + HS$x3
cor(HS$Visual, y = HS[paste0("x", 1:3)])
## comparable to psych::alpha(HS[paste0("x", 1:3)])
## Reliability of a composite that represents a higher-order factor
mod.hi <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
general =~ visual + textual + speed '
fit.hi <- cfa(mod.hi, data = HolzingerSwineford1939)
compRelSEM(fit.hi, higher = "general")
## reliabilities for lower-order composites also returned
## works for multigroup models and for multilevel models (and both)
data(Demo.twolevel)
## assign clusters to arbitrary groups
Demo.twolevel$g <- ifelse(Demo.twolevel$cluster %% 2L, "type1", "type2")
model2 <- ' group: type1
level: 1
f1 =~ y1 + L2*y2 + L3*y3
f2 =~ y4 + L5*y5 + L6*y6
level: 2
f1 =~ y1 + L2*y2 + L3*y3
f2 =~ y4 + L5*y5 + L6*y6
group: type2
level: 1
f1 =~ y1 + L2*y2 + L3*y3
f2 =~ y4 + L5*y5 + L6*y6
level: 2
f1 =~ y1 + L2*y2 + L3*y3
f2 =~ y4 + L5*y5 + L6*y6
'
fit2 <- sem(model2, data = Demo.twolevel, cluster = "cluster", group = "g")
compRelSEM(fit2) # Geldhof's indices (hypothetical, for latent components)
## Lai's (2021) indices for Level-1 and configural constructs
compRelSEM(fit2, config = c("f1","f2"))
## Lai's (2021) indices for shared (Level-2) constructs
## (also an interrater reliability coefficient)
compRelSEM(fit2, shared = c("f1","f2"))
## Shared construct using saturated within-level model
mod.sat1 <- ' level: 1
y1 ~~ y1 + y2 + y3 + y4 + y5 + y6
y2 ~~ y2 + y3 + y4 + y5 + y6
y3 ~~ y3 + y4 + y5 + y6
y4 ~~ y4 + y5 + y6
y5 ~~ y5 + y6
y6 ~~ y6
level: 2
f1 =~ y1 + L2*y2 + L3*y3
f2 =~ y4 + L5*y5 + L6*y6
'
fit.sat1 <- sem(mod.sat1, data = Demo.twolevel, cluster = "cluster")
compRelSEM(fit.sat1, shared = c("f1","f2"))
## Simultaneous shared-and-configural model (Stapleton et al, 2016, 2019),
## not recommended, but possible by omitting shared or configural factor.
mod.both <- ' level: 1
fc =~ y1 + L2*y2 + L3*y3 + L4*y4 + L5*y5 + L6*y6
level: 2
## configural construct
fc =~ y1 + L2*y2 + L3*y3 + L4*y4 + L5*y5 + L6*y6
## orthogonal shared construct
fs =~ NA*y1 + y2 + y3 + y4 + y5 + y6
fs ~~ 1*fs + 0*fc
'
fit.both <- sem(mod.both, data = Demo.twolevel, cluster = "cluster")
compRelSEM(fit.both, shared = "fs", config = "fc")
Build an object summarizing fit indices across multiple models
Description
This function will create the template to compare fit indices across multiple fitted lavaan objects. The results can be exported to a clipboard or a file later.
Usage
compareFit(..., nested = TRUE, argsLRT = list(), indices = TRUE,
moreIndices = FALSE, baseline.model = NULL, nPrior = 1)
Arguments
... |
fitted |
nested |
|
argsLRT |
|
indices |
|
moreIndices |
|
baseline.model |
optional fitted lavaan::lavaan model passed to
|
nPrior |
passed to |
Value
A FitDiff object that saves model fit comparisons across multiple models. If the models are not nested, only fit indices for each model are returned. If the models are nested, the differences in fit indices are additionally returned, as well as test statistics comparing each sequential pair of models (ordered by their degrees of freedom).
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Sunthud Pornprasertmanit (psunthud@gmail.com)
See Also
Examples
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
## non-nested models
fit1 <- cfa(HS.model, data = HolzingerSwineford1939)
m2 <- ' f1 =~ x1 + x2 + x3 + x4
f2 =~ x5 + x6 + x7 + x8 + x9 '
fit2 <- cfa(m2, data = HolzingerSwineford1939)
(out1 <- compareFit(fit1, fit2, nested = FALSE))
summary(out1)
## nested model comparisons: measurement equivalence/invariance
fit.config <- cfa(HS.model, data = HolzingerSwineford1939, group = "school")
fit.metric <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
group.equal = "loadings")
fit.scalar <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
group.equal = c("loadings","intercepts"))
fit.strict <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
group.equal = c("loadings","intercepts","residuals"))
measEqOut <- compareFit(fit.config, fit.metric, fit.scalar, fit.strict,
moreIndices = TRUE) # include moreFitIndices()
summary(measEqOut)
summary(measEqOut, fit.measures = "all")
summary(measEqOut, fit.measures = c("aic", "bic", "sic", "ibic"))
Simulated Dataset to Demonstrate Two-way Latent Interaction
Description
A simulated data set with 2 independent factors and 1 dependent factor where each factor has three indicators
Usage
dat2way
Format
A data.frame
with 500 observations of 9 variables.
- x1
The first indicator of the first independent factor
- x2
The second indicator of the first independent factor
- x3
The third indicator of the first independent factor
- x4
The first indicator of the second independent factor
- x5
The second indicator of the second independent factor
- x6
The third indicator of the second independent factor
- x7
The first indicator of the dependent factor
- x8
The second indicator of the dependent factor
- x9
The third indicator of the dependent factor
Source
Data were generated by the MASS::mvrnorm()
function in
the MASS
package.
Examples
head(dat2way)
Simulated Dataset to Demonstrate Three-way Latent Interaction
Description
A simulated data set with 3 independent factors and 1 dependent factor where each factor has three indicators
Usage
dat3way
Format
A data.frame
with 500 observations of 12 variables.
- x1
The first indicator of the first independent factor
- x2
The second indicator of the first independent factor
- x3
The third indicator of the first independent factor
- x4
The first indicator of the second independent factor
- x5
The second indicator of the second independent factor
- x6
The third indicator of the second independent factor
- x7
The first indicator of the third independent factor
- x8
The second indicator of the third independent factor
- x9
The third indicator of the third independent factor
- x10
The first indicator of the dependent factor
- x11
The second indicator of the dependent factor
- x12
The third indicator of the dependent factor
Source
Data were generated by the MASS::mvrnorm()
function in
the MASS
package.
Examples
head(dat3way)
Simulated Data set to Demonstrate Categorical Measurement Invariance
Description
A simulated data set with 2 factors with 4 indicators each separated into two groups
Usage
datCat
Format
A data.frame
with 200 observations of 9 variables.
- g
Sex of respondents
- u1
Indicator 1
- u2
Indicator 2
- u3
Indicator 3
- u4
Indicator 4
- u5
Indicator 5
- u6
Indicator 6
- u7
Indicator 7
- u8
Indicator 8
Source
Data were generated using the lavaan
package.
Examples
head(datCat)
Calculate discriminant validity statistics
Description
Calculate discriminant validity statistics based on a fitted lavaan object
Usage
discriminantValidity(object, cutoff = 0.9, merge = FALSE, level = 0.95,
boot.ci.type = "perc")
Arguments
object |
The lavaan::lavaan model object returned by
the |
cutoff |
A cutoff to be used in the constrained models in likelihood ratio tests. |
merge |
Whether the constrained models should be constructed by merging
two factors as one. Implies |
level |
The confidence level required. |
boot.ci.type |
If bootstrapping was used, the type of interval required.
The value should be one of |
Details
Evaluated on the measurement scale level, discriminant validity is commonly evaluated by checking if each pair of latent correlations is sufficiently below one (in absolute value) that the latent variables can be thought of representing two distinct constructs.
discriminantValidity
function calculates two sets of statistics that
are commonly used in discriminant validity evaluation. The first set are
factor correlation estimates and their confidence intervals. The second set
is a series of nested model tests, where the baseline model is compared
against a set of constrained models that are constructed by constraining
each factor correlation to the specified cutoff one at a time.
The function assume that the object
is set of confirmatory
factor analysis results where the latent variables are scaled by fixing their
variances to 1s. If the model is not a CFA model, the function will calculate
the statistics for the correlations among exogenous latent variables, but
for the residual variances with endogenous variables. If the
latent variables are scaled in some other way (e.g. fixing the first loadings),
the function issues a warning and re-estimates the model by fixing latent
variances to 1 (and estimating all loadings) so that factor covariances are
already estimated as correlations.
The likelihood ratio tests are done by comparing the original baseline model
against more constrained alternatives. By default, these alternatives are
constructed by fixing each correlation at a time to a cutoff value. The
typical purpose of this test is to demonstrate that the estimated factor
correlation is well below the cutoff and a significant chi^2
statistic
thus indicates support for discriminant validity. In some cases, the original
correlation estimate may already be greater than the cutoff, making it
redundant to fit a "restricted" model. When this happens, the likelihood
ratio test will be replaced by comparing the baseline model against itself.
For correlations that are estimated to be negative, a negation of the cutoff
is used in the constrained model.
Another alternative is to do a nested model comparison against a model where
two factors are merged as one by setting the merge
argument to
TRUE
. In this comparison, the constrained model is constructed by
removing one of the correlated factors from the model and assigning its
indicators to the factor that remains in the model.
Value
A data.frame
of latent variable correlation estimates, their
confidence intervals, and a likelihood ratio tests against constrained models.
with the following attributes:
- baseline
The baseline model after possible rescaling.
- constrained
A
list
of the fitted constrained models used in the likelihood ratio test.
Author(s)
Mikko Rönkkö (University of Jyväskylä; mikko.ronkko@jyu.fi):
References
Rönkkö, M., & Cho, E. (2022). An updated guideline for assessing discriminant validity. Organizational Research Methods, 25(1), 6–14. doi:10.1177/1094428120968614
Examples
library(lavaan)
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit <- cfa(HS.model, data = HolzingerSwineford1939)
discriminantValidity(fit)
discriminantValidity(fit, merge = TRUE)
Empirical Kaiser criterion
Description
Identify the number of factors to extract based on the Empirical Kaiser
Criterion (EKC). The analysis can be run on a data.frame
or data
matrix
(data
), or on a correlation or covariance matrix
(sample.cov
) and the sample size (sample.nobs
). A
data.frame
is returned with two columns: the eigenvalues from your
data or covariance matrix and the reference eigenvalues. The number of
factors suggested by the Empirical Kaiser Criterion (i.e. the sample
eigenvalues greater than the reference eigenvalues), and the number of
factors suggested by the original Kaiser Criterion
(i.e. sample eigenvalues > 1) is printed above the output.
Usage
efa.ekc(data = NULL, sample.cov = NULL, sample.nobs = NULL,
missing = "default", ordered = NULL, plot = TRUE)
Arguments
data |
A |
sample.cov |
A covariance or correlation matrix can be used, instead of
|
sample.nobs |
Number of observations (i.e. sample size) if
|
missing |
If |
ordered |
|
plot |
logical. Whether to print a scree plot comparing the sample eigenvalues with the reference eigenvalues. |
Value
A data.frame
showing the sample and reference eigenvalues.
The number of factors suggested by the Empirical Kaiser Criterion (i.e. the sample eigenvalues greater than the reference eigenvalues) is returned as an attribute (see Examples).
The number of factors suggested by the original Kaiser Criterion (i.e.
sample eigenvalues > 1) is also printed as a header to the data.frame
Author(s)
Ylenio Longo (University of Nottingham; yleniolongo@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Braeken, J., & van Assen, M. A. L. M. (2017). An empirical Kaiser criterion. Psychological Methods, 22(3), 450–466. doi:10.1037/met0000074
Examples
## Simulate data with 3 factors
model <- '
f1 =~ .3*x1 + .5*x2 + .4*x3
f2 =~ .3*x4 + .5*x5 + .4*x6
f3 =~ .3*x7 + .5*x8 + .4*x9
'
dat <- simulateData(model, seed = 123)
## save summary statistics
myCovMat <- cov(dat)
myCorMat <- cor(dat)
N <- nrow(dat)
## Run the EKC function
(out <- efa.ekc(dat))
## To extract the recommended number of factors using the EKC:
attr(out, "nfactors")
## If you do not have raw data, you can use summary statistics
(x1 <- efa.ekc(sample.cov = myCovMat, sample.nobs = N, plot = FALSE))
(x2 <- efa.ekc(sample.cov = myCorMat, sample.nobs = N, plot = FALSE))
Analyze Unrotated Exploratory Factor Analysis Model
Description
This function will analyze unrotated exploratory factor analysis model. The
unrotated solution can be rotated by the orthRotate
and
oblqRotate
functions.
Usage
efaUnrotate(data = NULL, nf, varList = NULL,
start = TRUE, aux = NULL, ...)
Arguments
data |
A target |
nf |
The desired number of factors |
varList |
Target observed variables. If not specified, all variables in
|
start |
Use starting values in the analysis from the
|
aux |
The list of auxiliary variables. These variables will be included in the model by the saturated-correlates approach to account for missing information. |
... |
Other arguments in the |
Details
This function will generate a lavaan script for unrotated exploratory factor
analysis model such that (1) all factor loadings are estimated, (2) factor
variances are fixed to 1, (3) factor covariances are fixed to 0, and (4) the
dot products of any pairs of columns in the factor loading matrix are fixed
to zero (Johnson & Wichern, 2002). The reason for creating this function
to supplement the factanal
function is that users can enjoy
some advanced features from the lavaan
package, such as scaled
\chi^2
, diagonally weighted least squares estimation for ordinal
indicators, or full-information maximum likelihood (FIML) to handle
incomplete data.
Value
A lavaan
output of unrotated exploratory factor analysis
solution.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
See Also
Examples
unrotated <- efaUnrotate(HolzingerSwineford1939, nf = 3,
varList = paste0("x", 1:9), estimator = "mlr")
summary(unrotated, std = TRUE)
inspect(unrotated, "std")
dat <- data.frame(HolzingerSwineford1939,
z = rnorm(nrow(HolzingerSwineford1939), 0, 1))
unrotated2 <- efaUnrotate(dat, nf = 2, varList = paste0("x", 1:9), aux = "z")
Simulated Data set to Demonstrate Longitudinal Measurement Invariance
Description
A simulated data set with 1 factors with 3 indicators in three timepoints
Usage
exLong
Format
A data.frame
with 200 observations of 10 variables.
- sex
Sex of respondents
- y1t1
Indicator 1 in Time 1
- y2t1
Indicator 2 in Time 1
- y3t1
Indicator 3 in Time 1
- y1t2
Indicator 1 in Time 2
- y2t2
Indicator 2 in Time 2
- y3t2
Indicator 3 in Time 2
- y1t3
Indicator 1 in Time 3
- y2t3
Indicator 2 in Time 3
- y3t3
Indicator 3 in Time 3
Source
Data were generated using the simsem
package.
Examples
head(exLong)
Find the statistical power based on population RMSEA
Description
Find the proportion of the samples from the sampling distribution of RMSEA in the alternative hypothesis rejected by the cutoff dervied from the sampling distribution of RMSEA in the null hypothesis. This function can be applied for both test of close fit and test of not-close fit (MacCallum, Browne, & Suguwara, 1996)
Usage
findRMSEApower(rmsea0, rmseaA, df, n, alpha = 0.05, group = 1)
Arguments
rmsea0 |
Null RMSEA |
rmseaA |
Alternative RMSEA |
df |
Model degrees of freedom |
n |
Sample size of a dataset |
alpha |
Alpha level used in power calculations |
group |
The number of group that is used to calculate RMSEA. |
Details
This function find the proportion of sampling distribution derived from the
alternative RMSEA that is in the critical region derived from the sampling
distribution of the null RMSEA. If rmseaA
is greater than
rmsea0
, the test of close fit is used and the critical region is in
the right hand side of the null sampling distribution. On the other hand, if
rmseaA
is less than rmsea0
, the test of not-close fit is used
and the critical region is in the left hand side of the null sampling
distribution (MacCallum, Browne, & Suguwara, 1996).
There is also a Shiny app called "power4SEM" that provides a graphical user interface for this functionality (Jak et al., in press). It can be accessed at https://sjak.shinyapps.io/power4SEM/.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1(2), 130–149. doi:10.1037/1082-989X.1.2.130
Jak, S., Jorgensen, T. D., Verdam, M. G., Oort, F. J., & Elffers, L. (2021). Analytical power calculations for structural equation modeling: A tutorial and Shiny app. Behavior Research Methods, 53, 1385–1406. doi:10.3758/s13428-020-01479-0
See Also
-
plotRMSEApower()
to plot the statistical power based on population RMSEA given the sample size -
plotRMSEAdist()
to visualize the RMSEA distributions -
findRMSEAsamplesize()
to find the minium sample size for a given statistical power based on population RMSEA
Examples
findRMSEApower(rmsea0 = .05, rmseaA = .08, df = 20, n = 200)
Find power given a sample size in nested model comparison
Description
Find the sample size that the power in rejection the samples from the alternative pair of RMSEA is just over the specified power.
Usage
findRMSEApowernested(rmsea0A = NULL, rmsea0B = NULL, rmsea1A,
rmsea1B = NULL, dfA, dfB, n, alpha = 0.05, group = 1)
Arguments
rmsea0A |
The |
rmsea0B |
The |
rmsea1A |
The |
rmsea1B |
The |
dfA |
degree of freedom of the more-restricted model |
dfB |
degree of freedom of the less-restricted model |
n |
Sample size |
alpha |
The alpha level |
group |
The number of group in calculating RMSEA |
Author(s)
Bell Clinton
Pavel Panko (Texas Tech University; pavel.panko@ttu.edu)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between nested covariance structure models: Power analysis and null hypotheses. Psychological Methods, 11(1), 19–35. doi:10.1037/1082-989X.11.1.19
See Also
-
plotRMSEApowernested()
to plot the statistical power for nested model comparison based on population RMSEA given the sample size -
findRMSEAsamplesizenested()
to find the minium sample size for a given statistical power in nested model comparison based on population RMSEA
Examples
findRMSEApowernested(rmsea0A = 0.06, rmsea0B = 0.05, rmsea1A = 0.08,
rmsea1B = 0.05, dfA = 22, dfB = 20, n = 200,
alpha = 0.05, group = 1)
Find the minimum sample size for a given statistical power based on population RMSEA
Description
Find the minimum sample size for a specified statistical power based on population RMSEA. This function can be applied for both test of close fit and test of not-close fit (MacCallum, Browne, & Suguwara, 1996)
Usage
findRMSEAsamplesize(rmsea0, rmseaA, df, power = 0.8, alpha = 0.05,
group = 1)
Arguments
rmsea0 |
Null RMSEA |
rmseaA |
Alternative RMSEA |
df |
Model degrees of freedom |
power |
Desired statistical power to reject misspecified model (test of close fit) or retain good model (test of not-close fit) |
alpha |
Alpha level used in power calculations |
group |
The number of group that is used to calculate RMSEA. |
Details
This function find the minimum sample size for a specified power based on an
iterative routine. The sample size keep increasing until the calculated
power from findRMSEApower()
function is just over the specified
power. If group
is greater than 1, the resulting sample size is the
sample size per group.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1(2), 130–149. doi:10.1037/1082-989X.1.2.130
Jak, S., Jorgensen, T. D., Verdam, M. G., Oort, F. J., & Elffers, L. (2021). Analytical power calculations for structural equation modeling: A tutorial and Shiny app. Behavior Research Methods, 53, 1385–1406. doi:10.3758/s13428-020-01479-0
See Also
-
plotRMSEApower()
to plot the statistical power based on population RMSEA given the sample size -
plotRMSEAdist()
to visualize the RMSEA distributions -
findRMSEApower()
to find the statistical power based on population RMSEA given a sample size
Examples
findRMSEAsamplesize(rmsea0 = .05, rmseaA = .08, df = 20, power = 0.80)
Find sample size given a power in nested model comparison
Description
Find the sample size that the power in rejection the samples from the alternative pair of RMSEA is just over the specified power.
Usage
findRMSEAsamplesizenested(rmsea0A = NULL, rmsea0B = NULL, rmsea1A,
rmsea1B = NULL, dfA, dfB, power = 0.8, alpha = 0.05, group = 1)
Arguments
rmsea0A |
The |
rmsea0B |
The |
rmsea1A |
The |
rmsea1B |
The |
dfA |
degree of freedom of the more-restricted model. |
dfB |
degree of freedom of the less-restricted model. |
power |
The desired statistical power. |
alpha |
The alpha level. |
group |
The number of group in calculating RMSEA. |
Author(s)
Bell Clinton
Pavel Panko (Texas Tech University; pavel.panko@ttu.edu)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between nested covariance structure models: Power analysis and null hypotheses. Psychological Methods, 11(1), 19–35. doi:10.1037/1082-989X.11.1.19
See Also
-
plotRMSEApowernested()
to plot the statistical power for nested model comparison based on population RMSEA given the sample size -
findRMSEApowernested()
to find the power for a given sample size in nested model comparison based on population RMSEA
Examples
findRMSEAsamplesizenested(rmsea0A = 0, rmsea0B = 0, rmsea1A = 0.06,
rmsea1B = 0.05, dfA = 22, dfB = 20, power = 0.80,
alpha = .05, group = 1)
Fraction of Missing Information.
Description
This function estimates the Fraction of Missing Information (FMI) for summary statistics of each variable, using either an incomplete data set or a list of imputed data sets.
Usage
fmi(data, method = "saturated", group = NULL, ords = NULL,
varnames = NULL, exclude = NULL, return.fit = FALSE)
Arguments
data |
Either a single |
method |
character. If |
group |
|
ords |
Optional |
varnames |
Optional |
exclude |
Optional |
return.fit |
logical. If |
Details
The function estimates a saturated model with lavaan::lavaan()
for a
single incomplete data set using FIML, or with lavaan.mi::lavaan.mi()
for a list of imputed data sets. If method = "saturated"
, FMI will be
estiamted for all summary statistics, which could take a lot of time with
big data sets. If method = "null"
, FMI will only be estimated for
univariate statistics (e.g., means, variances, thresholds). The saturated
model gives more reliable estimates, so it could also help to request a
subset of variables from a large data set.
Value
fmi()
returns a list with at least 2 of the following:
Covariances |
A list of symmetric matrices: (1) the estimated/pooled
covariance matrix, or a list of group-specific matrices (if applicable)
and (2) a matrix of FMI, or a list of group-specific matrices (if
applicable). Only available if |
Variances |
The estimated/pooled variance for each numeric variable.
Only available if |
Means |
The estimated/pooled mean for each numeric variable. |
Thresholds |
The estimated/pooled threshold(s) for each ordered-categorical variable. |
Author(s)
Mauricio Garnier Villarreal (Vrije Universiteit Amsterdam; m.garniervillarreal@vu.nl)
Terrence Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.
Savalei, V. & Rhemtulla, M. (2012). On obtaining estimates of the fraction of missing information from full information maximum likelihood. Structural Equation Modeling, 19(3), 477–494. doi:10.1080/10705511.2012.687669
Wagner, J. (2010). The fraction of missing information as a tool for monitoring the quality of survey data. Public Opinion Quarterly, 74(2), 223–243. doi:10.1093/poq/nfq007
Examples
HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
"ageyr","agemo","school")]
set.seed(12345)
HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
age <- HSMiss$ageyr + HSMiss$agemo/12
HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)
## calculate FMI (using FIML, provide partially observed data set)
(out1 <- fmi(HSMiss, exclude = "school"))
(out2 <- fmi(HSMiss, exclude = "school", method = "null"))
(out3 <- fmi(HSMiss, varnames = c("x5","x6","x7","x8","x9")))
(out4 <- fmi(HSMiss, method = "cor", group = "school")) # correlations by group
## significance tests in lavaan(.mi) object
out5 <- fmi(HSMiss, method = "cor", return.fit = TRUE)
summary(out5) # factor loading == SD, covariance = correlation
if(requireNamespace("lavaan.mi")){
## ordered-categorical data
data(binHS5imps, package = "lavaan.mi")
## calculate FMI, using list of imputed data sets
fmi(binHS5imps, group = "school")
}
Wrapper for goric.lavaan()
from the restriktor
package
Description
The goricaSEM()
function is an interface to restriktor::goric.lavaan()
,
allowing users to perform generalized order-restricted information criterion
approximation (GORICA) analysis specifically for structural equation
models fitted using the lavaan package.
Usage
goricaSEM(object, ..., hypotheses = NULL, comparison = NULL,
type = "gorica", standardized = FALSE, debug = FALSE)
Arguments
object |
A lavaan::lavaan object. |
... |
Additional arguments passed to |
hypotheses |
A named |
comparison |
A |
type |
A |
standardized |
|
debug |
|
Details
This function is designed as a wrapper for the restriktor::goric.lavaan()
function. It calculates GORICA values and weights, which can be used to
compare models or hypotheses under inequality constraints.
The hypotheses=
argument allows users to specify constraints in text-based
syntax or matrix notation. For text-based syntax, constraints are specified
as a string (e.g., "a1 > a2"
). For matrix notation, a named list with
$constraints
, $rhs
, and $neq
elements can be provided.
The comparison=
argument determines whether the specified hypothesis is
compared against its "complement"
, the "unconstrained"
model, or
neither ("none"
).
Value
A list
containing the results of the goric.lavaan
function,
including:
The log-likelihood.
Penalty term.
GORIC(A) values and weights.
Relative GORIC(A) weights.
Author(s)
Leonard Vanbrabant and Rebecca Kuiper
References
Kuiper, R. M., Hoijtink, H., & Silvapulle, M. J. (2011). An Akaike-type information criterion for model selection under inequality constraints. Biometrika, 98(2), 495–501. doi:10.1093/biomet/asr002
Vanbrabant, L., Van Loey, N., & Kuiper, R. M. (2020). Evaluating a theory-based hypothesis against its complement using an AIC-type information criterion with an application to facial burn injury. Psychological Methods, 25(2), 129–142. doi:10.1037/met0000238
See Also
Examples
## Example: Perform GORICA analysis on a lavaan model
library(lavaan)
library(restriktor)
## Define the SEM model
model <- '
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + a1*y2 + b1*y3 + c1*y4
dem65 =~ y5 + a2*y6 + b2*y7 + c2*y8
dem60 ~ ind60
dem65 ~ ind60 + dem60
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
## Fit the model
data(PoliticalDemocracy)
fit <- sem(model, data = PoliticalDemocracy)
## Define hypotheses
myHypothesis <- 'a1 > a2, b1 > b2, c1 > c2'
## Perform GORICA analysis
result <- goricaSEM(fit, hypotheses = list(H1 = myHypothesis),
standardized = FALSE, comparison = "complement",
type = "gorica")
## Print result
print(result)
Assessing Discriminant Validity using Heterotrait–Monotrait Ratio
Description
This function assesses discriminant validity through the
heterotrait-monotrait ratio (HTMT) of the correlations (Henseler, Ringlet &
Sarstedt, 2015). Specifically, it assesses the arithmetic (Henseler et al.,
) or geometric (Roemer et al., 2021) mean correlation
among indicators across constructs (i.e. heterotrait–heteromethod
correlations) relative to the geometric-mean correlation among indicators
within the same construct (i.e. monotrait–heteromethod correlations).
The resulting HTMT(2) values are interpreted as estimates of inter-construct
correlations. Absolute values of the correlations are recommended to
calculate the HTMT matrix, and are required to calculate HTMT2. Correlations
are estimated using the lavaan::lavCor()
function.
Usage
htmt(model, data = NULL, sample.cov = NULL, missing = "listwise",
ordered = NULL, absolute = TRUE, htmt2 = TRUE)
Arguments
model |
lavaan |
data |
A |
sample.cov |
A covariance or correlation matrix can be used, instead of
|
missing |
If |
ordered |
Character vector. Only used if object is a |
absolute |
|
htmt2 |
|
Value
A matrix showing HTMT(2) values (i.e., discriminant validity) between each pair of factors.
Author(s)
Ylenio Longo (University of Nottingham; yleniolongo@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Henseler, J., Ringle, C. M., & Sarstedt, M. (2015). A new criterion for assessing discriminant validity in variance-based structural equation modeling. Journal of the Academy of Marketing Science, 43(1), 115–135. doi:10.1007/s11747-014-0403-8
Roemer, E., Schuberth, F., & Henseler, J. (2021). HTMT2—An improved criterion for assessing discriminant validity in structural equation modeling. Industrial Management & Data Systems, 121(21), 2637–2650. doi:10.1108/IMDS-02-2021-0082
Voorhees, C. M., Brady, M. K., Calantone, R., & Ramirez, E. (2016). Discriminant validity testing in marketing: An analysis, causes for concern, and proposed remedies. Journal of the Academy of Marketing Science, 44(1), 119–134. doi:10.1007/s11747-015-0455-4
Examples
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
dat <- HolzingerSwineford1939[, paste0("x", 1:9)]
htmt(HS.model, dat)
## save covariance matrix
HS.cov <- cov(HolzingerSwineford1939[, paste0("x", 1:9)])
## HTMT using arithmetic mean
htmt(HS.model, sample.cov = HS.cov, htmt2 = FALSE)
Specify starting values from a lavaan output
Description
This function will save the parameter estimates of a lavaan output and impose those parameter estimates as starting values for another analysis model. The free parameters with the same names or the same labels across two models will be imposed the new starting values. This function may help to increase the chance of convergence in a complex model (e.g., multitrait-multimethod model or complex longitudinal invariance model).
Usage
imposeStart(out, expr, silent = TRUE)
Arguments
out |
The |
expr |
The original code that users use to run a lavaan model |
silent |
Logical to print the parameter table with new starting values |
Value
A fitted lavaan model
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Examples
## The following example show that the longitudinal weak invariance model
## using effect coding was not convergent with three time points but convergent
## with two time points. Thus, the parameter estimates from the model with
## two time points are used as starting values of the three time points.
## The model with new starting values is convergent properly.
weak2time <- '
# Loadings
f1t1 =~ LOAD1*y1t1 + LOAD2*y2t1 + LOAD3*y3t1
f1t2 =~ LOAD1*y1t2 + LOAD2*y2t2 + LOAD3*y3t2
# Factor Variances
f1t1 ~~ f1t1
f1t2 ~~ f1t2
# Factor Covariances
f1t1 ~~ f1t2
# Error Variances
y1t1 ~~ y1t1
y2t1 ~~ y2t1
y3t1 ~~ y3t1
y1t2 ~~ y1t2
y2t2 ~~ y2t2
y3t2 ~~ y3t2
# Error Covariances
y1t1 ~~ y1t2
y2t1 ~~ y2t2
y3t1 ~~ y3t2
# Factor Means
f1t1 ~ NA*1
f1t2 ~ NA*1
# Measurement Intercepts
y1t1 ~ INT1*1
y2t1 ~ INT2*1
y3t1 ~ INT3*1
y1t2 ~ INT4*1
y2t2 ~ INT5*1
y3t2 ~ INT6*1
# Constraints for Effect-coding Identification
LOAD1 == 3 - LOAD2 - LOAD3
INT1 == 0 - INT2 - INT3
INT4 == 0 - INT5 - INT6
'
model2time <- lavaan(weak2time, data = exLong)
weak3time <- '
# Loadings
f1t1 =~ LOAD1*y1t1 + LOAD2*y2t1 + LOAD3*y3t1
f1t2 =~ LOAD1*y1t2 + LOAD2*y2t2 + LOAD3*y3t2
f1t3 =~ LOAD1*y1t3 + LOAD2*y2t3 + LOAD3*y3t3
# Factor Variances
f1t1 ~~ f1t1
f1t2 ~~ f1t2
f1t3 ~~ f1t3
# Factor Covariances
f1t1 ~~ f1t2 + f1t3
f1t2 ~~ f1t3
# Error Variances
y1t1 ~~ y1t1
y2t1 ~~ y2t1
y3t1 ~~ y3t1
y1t2 ~~ y1t2
y2t2 ~~ y2t2
y3t2 ~~ y3t2
y1t3 ~~ y1t3
y2t3 ~~ y2t3
y3t3 ~~ y3t3
# Error Covariances
y1t1 ~~ y1t2
y2t1 ~~ y2t2
y3t1 ~~ y3t2
y1t1 ~~ y1t3
y2t1 ~~ y2t3
y3t1 ~~ y3t3
y1t2 ~~ y1t3
y2t2 ~~ y2t3
y3t2 ~~ y3t3
# Factor Means
f1t1 ~ NA*1
f1t2 ~ NA*1
f1t3 ~ NA*1
# Measurement Intercepts
y1t1 ~ INT1*1
y2t1 ~ INT2*1
y3t1 ~ INT3*1
y1t2 ~ INT4*1
y2t2 ~ INT5*1
y3t2 ~ INT6*1
y1t3 ~ INT7*1
y2t3 ~ INT8*1
y3t3 ~ INT9*1
# Constraints for Effect-coding Identification
LOAD1 == 3 - LOAD2 - LOAD3
INT1 == 0 - INT2 - INT3
INT4 == 0 - INT5 - INT6
INT7 == 0 - INT8 - INT9
'
### The following command does not provide convergent result
# model3time <- lavaan(weak3time, data = exLong)
### Use starting values from the model with two time points
model3time <- imposeStart(model2time, lavaan(weak3time, data = exLong))
summary(model3time)
Make products of indicators using no centering, mean centering, double-mean centering, or residual centering
Description
The indProd
function will make products of indicators using no
centering, mean centering, double-mean centering, or residual centering. The
orthogonalize
function is the shortcut of the indProd
function
to make the residual-centered indicators products.
Usage
indProd(data, var1, var2, var3 = NULL, match = TRUE, meanC = TRUE,
residualC = FALSE, doubleMC = TRUE, namesProd = NULL)
orthogonalize(data, var1, var2, var3 = NULL, match = TRUE,
namesProd = NULL)
Arguments
data |
The desired data to be transformed. |
var1 |
Names or indices of the variables loaded on the first factor |
var2 |
Names or indices of the variables loaded on the second factor |
var3 |
Names or indices of the variables loaded on the third factor (for three-way interaction) |
match |
Specify |
meanC |
Specify |
residualC |
Specify |
doubleMC |
Specify |
namesProd |
The names of resulting products |
Value
The original data attached with the products.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com) Alexander Schoemann (East Carolina University; schoemanna@ecu.edu)
References
Marsh, H. W., Wen, Z. & Hau, K. T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9(3), 275–300. doi:10.1037/1082-989X.9.3.275
Lin, G. C., Wen, Z., Marsh, H. W., & Lin, H. S. (2010). Structural equation models of latent interactions: Clarification of orthogonalizing and double-mean-centering strategies. Structural Equation Modeling, 17(3), 374–391. doi:10.1080/10705511.2010.488999
Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of orthogonalizing powered and product terms: Implications for modeling interactions among latent variables. Structural Equation Modeling, 13(4), 497–519. doi:10.1207/s15328007sem1304_1
See Also
-
probe2WayMC()
For probing the two-way latent interaction when the results are obtained from mean-centering, or double-mean centering. -
probe3WayMC()
For probing the three-way latent interaction when the results are obtained from mean-centering, or double-mean centering. -
probe2WayRC()
For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
probe3WayRC()
For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
plotProbe()
Plot the simple intercepts and slopes of the latent interaction.
Examples
## Mean centering / two-way interaction / match-paired
dat <- indProd(attitude[ , -1], var1 = 1:3, var2 = 4:6)
## Residual centering / two-way interaction / match-paired
dat2 <- indProd(attitude[ , -1], var1 = 1:3, var2 = 4:6, match = FALSE,
meanC = FALSE, residualC = TRUE, doubleMC = FALSE)
## Double-mean centering / two-way interaction / match-paired
dat3 <- indProd(attitude[ , -1], var1 = 1:3, var2 = 4:6, match = FALSE,
meanC = TRUE, residualC = FALSE, doubleMC = TRUE)
## Mean centering / three-way interaction / match-paired
dat4 <- indProd(attitude[ , -1], var1 = 1:2, var2 = 3:4, var3 = 5:6)
## Residual centering / three-way interaction / match-paired
dat5 <- orthogonalize(attitude[ , -1], var1 = 1:2, var2 = 3:4, var3 = 5:6,
match = FALSE)
## Double-mean centering / three-way interaction / match-paired
dat6 <- indProd(attitude[ , -1], var1 = 1:2, var2 = 3:4, var3 = 5:6,
match = FALSE, meanC = TRUE, residualC = TRUE,
doubleMC = TRUE)
## To add product-indicators to multiple-imputed data sets
HSMiss <- HolzingerSwineford1939[ , c(paste0("x", 1:9), "ageyr","agemo")]
set.seed(12345)
HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
age <- HSMiss$ageyr + HSMiss$agemo/12
HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)
library(Amelia)
set.seed(12345)
HS.amelia <- amelia(HSMiss, m = 3, p2s = FALSE)
imps <- HS.amelia$imputations # extract a list of imputations
## apply indProd() to the list of data.frames
imps2 <- lapply(imps, indProd,
var1 = c("x1","x2","x3"), var2 = c("x4","x5","x6"))
## verify:
lapply(imps2, head)
Generate data via the Kaiser-Dickman (1962) algorithm.
Description
Given a covariance matrix and sample size, generate raw data that correspond to the covariance matrix. Data can be generated to match the covariance matrix exactly, or to be a sample from the population covariance matrix.
Usage
kd(covmat, n, type = c("exact", "sample"))
Arguments
covmat |
a symmetric, positive definite covariance matrix |
n |
the sample size for the data that will be generated |
type |
type of data generation. |
Details
By default, R's cov()
function divides by n
-1. The data
generated by this algorithm result in a covariance matrix that matches
covmat
, but you must divide by n
instead of n
-1.
Value
kd
returns a data matrix of dimension n
by
nrow(covmat)
.
Author(s)
Ed Merkle (University of Missouri; merklee@missouri.edu)
References
Kaiser, H. F. and Dickman, K. (1962). Sample and population score matrices and sample correlation matrices from an arbitrary population correlation matrix. Psychometrika, 27(2), 179–182. doi:10.1007/BF02289635
Examples
#### First Example
## Get data
dat <- HolzingerSwineford1939[ , 7:15]
hs.n <- nrow(dat)
## Covariance matrix divided by n
hscov <- ((hs.n-1)/hs.n) * cov(dat)
## Generate new, raw data corresponding to hscov
newdat <- kd(hscov, hs.n)
## Difference between new covariance matrix and hscov is minimal
newcov <- (hs.n-1)/hs.n * cov(newdat)
summary(as.numeric(hscov - newcov))
## Generate sample data, treating hscov as population matrix
newdat2 <- kd(hscov, hs.n, type = "sample")
#### Another example
## Define a covariance matrix
covmat <- matrix(0, 3, 3)
diag(covmat) <- 1.5
covmat[2:3,1] <- c(1.3, 1.7)
covmat[3,2] <- 2.1
covmat <- covmat + t(covmat)
## Generate data of size 300 that have this covariance matrix
rawdat <- kd(covmat, 300)
## Covariances are exact if we compute sample covariance matrix by
## dividing by n (vs by n - 1)
summary(as.numeric((299/300)*cov(rawdat) - covmat))
## Generate data of size 300 where covmat is the population covariance matrix
rawdat2 <- kd(covmat, 300)
Finding excessive kurtosis
Description
Finding excessive kurtosis (g_{2}
) of an object
Usage
kurtosis(object, population = FALSE)
Arguments
object |
A vector used to find a excessive kurtosis |
population |
|
Details
The excessive kurtosis computed by default is g_{2}
, the fourth
standardized moment of the empirical distribution of object
.
The population parameter excessive kurtosis \gamma_{2}
formula is
\gamma_{2} = \frac{\mu_{4}}{\mu^{2}_{2}} - 3,
where \mu_{i}
denotes the i
order central moment.
The excessive kurtosis formula for sample statistic g_{2}
is
g_{2} = \frac{k_{4}}{k^{2}_{2}} - 3,
where k_{i}
are the i
order k-statistic.
The standard error of the excessive kurtosis is
Var(\hat{g}_{2}) = \frac{24}{N}
where N
is the sample size.
Value
A value of an excessive kurtosis with a test statistic if the
population is specified as FALSE
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Weisstein, Eric W. (n.d.). Kurtosis. Retrieved from MathWorld–A Wolfram Web Resource: http://mathworld.wolfram.com/Kurtosis.html
See Also
-
skew()
Find the univariate skewness of a variable -
mardiaSkew()
Find the Mardia's multivariate skewness of a set of variables -
mardiaKurtosis()
Find the Mardia's multivariate kurtosis of a set of variables
Examples
kurtosis(1:5)
Likelihood Ratio Test for Multiple Imputations
Description
Likelihood ratio test (LRT) for lavaan models fitted to multiple imputed data sets. Statistics for comparing nested models can be calculated by pooling the likelihood ratios across imputed data sets, as described by Meng & Rubin (1992), or by pooling the LRT statistics from each imputation, as described by Li, Meng, Raghunathan, & Rubin (1991).
Usage
lavTestLRT.mi(object, h1 = NULL, test = c("D3","D2"),
omit.imps = c("no.conv","no.se"),
asymptotic = FALSE, pool.robust = FALSE, ...)
Arguments
object , h1 |
An object of class OLDlavaan.mi.
|
test |
|
omit.imps |
|
asymptotic |
|
pool.robust |
|
... |
Additional arguments passed to |
Details
The Meng & Rubin (1992) method, also referred to as the "D3"
statistic, is only applicable when using a likelihood-based estimator.
Otherwise (e.g., DWLS for categorical outcomes), users are notified that
test
was set to "D2"
.
test = "Mplus"
implies "D3"
and asymptotic = TRUE
(see Asparouhov & Muthen, 2010).
Note that unlike lavaan::lavTestLRT()
, lavTestLRT
can
only be used to compare a single pair of models, not a longer list of
models. To compare several nested models fitted to multiple imputations,
see examples on the compareFit()
help page.
Value
A vector containing the LRT statistic (either an F
or \chi^2
statistic, depending on the asymptotic
argument), its degrees of
freedom (numerator and denominator, if asymptotic = FALSE
), its
p value, and 2 missing-data diagnostics: the relative invrease
in variance (RIV, or average for multiparameter tests: ARIV) and the
fraction missing information (FMI = ARIV / (1 + ARIV)). Robust statistics
will also include the average (across imputations) scaling factor and
(if relevant) shift parameter(s), unless pool.robust = TRUE
.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1(1), 65–92. Retrieved from https://www.jstor.org/stable/24303994
Meng, X.-L., & Rubin, D. B. (1992). Performing likelihood ratio tests with multiply-imputed data sets. Biometrika, 79(1), 103–111. doi:10.2307/2337151
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.
See Also
lavaan::lavTestLRT()
, compareFit()
Examples
## See the new lavaan.mi package
Score Test for Multiple Imputations
Description
Score test (or "Lagrange multiplier" test) for lavaan models fitted to multiple imputed data sets. Statistics for releasing one or more fixed or constrained parameters in model can be calculated by pooling the gradient and information matrices pooled across imputed data sets in a method proposed by Mansolf, Jorgensen, & Enders (2020)—analogous to the "D1" Wald test proposed by Li, Meng, Raghunathan, & Rubin's (1991)—or by pooling the complete-data score-test statistics across imputed data sets (i.e., "D2"; Li et al., 1991).
Usage
lavTestScore.mi(object, add = NULL, release = NULL,
test = c("D2","D1"), scale.W = !asymptotic,
omit.imps = c("no.conv","no.se"),
asymptotic = is.null(add),
univariate = TRUE, cumulative = FALSE,
epc = FALSE, standardized = epc, cov.std = epc,
verbose = FALSE, warn = TRUE, information = "expected")
Arguments
object |
An object of class OLDlavaan.mi. |
add |
Either a |
release |
Vector of |
test |
|
scale.W |
|
omit.imps |
|
asymptotic |
|
univariate |
|
cumulative |
|
epc |
|
standardized |
If |
cov.std |
|
verbose |
|
warn |
|
information |
|
Value
A list containing at least one data.frame
:
$test
: The total score test, with columns for the score test statistic (X2
), its degrees of freedom (df
), its p value under the\chi^2
distribution (p.value
), and ifasymptotic=FALSE
, the average relative invrease in variance (ARIV) used to calculate the denominator df is also returned as a missing-data diagnostic, along with the fraction missing information (FMI = ARIV / (1 + ARIV)).$uni
: Optional (ifunivariate=TRUE
). Each 1-df score test, equivalent to modification indices. Also includes EPCs ifepc=TRUE
, and RIV and FMI ifasymptotic=FALSE
.$cumulative
: Optional (ifcumulative=TRUE
). Cumulative score tests, with ARIV and FMI ifasymptotic=FALSE
.$epc
: Optional (ifepc=TRUE
). Parameter estimates, expected parameter changes, and expected parameter values if ALL the tested constraints were freed.
See lavaan::lavTestScore()
for details.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Adapted from lavaan source code, written by Yves Rosseel (Ghent University; Yves.Rosseel@UGent.be)
test = "D1"
method proposed by
Maxwell Mansolf (University of California, Los Angeles;
mamansolf@gmail.com)
References
Bentler, P. M., & Chou, C.-P. (1992). Some new covariance structure model improvement statistics. Sociological Methods & Research, 21(2), 259–282. doi:10.1177/0049124192021002006
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1(1), 65–92. Retrieved from https://www.jstor.org/stable/24303994
Mansolf, M., Jorgensen, T. D., & Enders, C. K. (2020). A multiple imputation score test for model modification in structural equation models. Psychological Methods, 25(4), 393–411. doi:10.1037/met0000243
See Also
Examples
## See the new lavaan.mi package
Wald Test for Multiple Imputations
Description
Wald test for testing a linear hypothesis about the parameters of lavaan models fitted to multiple imputed data sets. Statistics for constraining one or more free parameters in a model can be calculated from the pooled point estimates and asymptotic covariance matrix of model parameters using Rubin's (1987) rules, or by pooling the Wald test statistics across imputed data sets (Li, Meng, Raghunathan, & Rubin, 1991).
Usage
lavTestWald.mi(object, constraints = NULL, test = c("D1","D2"),
asymptotic = FALSE, scale.W = !asymptotic,
omit.imps = c("no.conv","no.se"),
verbose = FALSE, warn = TRUE)
Arguments
object |
An object of class OLDlavaan.mi. |
constraints |
A |
test |
|
asymptotic |
|
scale.W |
|
omit.imps |
|
verbose |
|
warn |
|
Details
The constraints are specified using the "=="
operator.
Both the left-hand side and the right-hand side of the equality can contain
a linear combination of model parameters, or a constant (like zero).
The model parameters must be specified by their user-specified labels from
the link[lavaan]{model.syntax}
. Names of defined parameters
(using the ":=" operator) can be included too.
Value
A vector containing the Wald test statistic (either an F
or
\chi^2
statistic, depending on the asymptotic
argument),
the degrees of freedom (numerator and denominator, if
asymptotic = FALSE
), and a p value. If
asymptotic = FALSE
, the relative invrease in variance (RIV, or
average for multiparameter tests: ARIV) used to calculate the denominator
df is also returned as a missing-data diagnostic, along with the
fraction missing information (FMI = ARIV / (1 + ARIV)).
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Adapted from lavaan source code, written by Yves Rosseel (Ghent University; Yves.Rosseel@UGent.be)
References
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1(1), 65–92. Retrieved from https://www.jstor.org/stable/24303994
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.
See Also
Examples
## See the new lavaan.mi package
emmeans
Support Functions for lavaan
Models
Description
Provide emmeans support for lavaan objects
Usage
recover_data.lavaan(object, lavaan.DV, data = NULL, ...)
emm_basis.lavaan(object, trms, xlev, grid, lavaan.DV, ...)
Arguments
object |
An object of class |
lavaan.DV |
|
data |
An optional |
... |
Further arguments passed to |
trms , xlev , grid |
See |
Details
Supported DVs
lavaan.DV
must be an endogenous variable, by appearing on
the left-hand side of either a regression operator ("~"
)
or an intercept operator ("~1"
), or both.
lavaan.DV
can also be a vector of endogenous variable, in which
case they will be treated by emmeans
as a multivariate outcome
(often, this indicates repeated measures) represented by an additional
factor named rep.meas
by default. The mult.name=
argument
can be used to overwrite this default name.
Unsupported Models
This functionality does not support the following models:
Multi-level models are not supported.
Models not fit to a
data.frame
(i.e., models fit to a covariance matrix).
Dealing with Fixed Parameters
Fixed parameters (set with lavaan
's modifiers) are treated as-is:
their values are set by the users, and they have a SE of 0 (as such,
they do not co-vary with any other parameter).
Dealing with Multigroup Models
If a multigroup model is supplied, a factor is added to the reference grid,
the name matching the group
argument supplied when fitting the model.
Note that you must set nesting = NULL
.
Dealing with Missing Data
Limited testing suggests that these functions do work when the model was fit to incomplete data.
Dealing with Factors
By default emmeans
recognizes binary variables (0,1) as a "factor"
with two levels (and not a continuous variable). With some clever contrast
defenitions it should be possible to get the desired emmeans / contasts.
See example below.
Author(s)
Mattan S. Ben-Shachar (Ben-Gurion University of the Negev; matanshm@post.bgu.ac.il)
Examples
## Not run:
library(lavaan)
library(emmeans)
#### Moderation Analysis ####
mean_sd <- function(x) mean(x) + c(-sd(x), 0, sd(x))
model <- '
# regressions
Sepal.Length ~ b1 * Sepal.Width + b2 * Petal.Length + b3 * Sepal.Width:Petal.Length
# define mean parameter label for centered math for use in simple slopes
Sepal.Width ~ Sepal.Width.mean * 1
# define variance parameter label for centered math for use in simple slopes
Sepal.Width ~~ Sepal.Width.var * Sepal.Width
# simple slopes for condition effect
SD.below := b2 + b3 * (Sepal.Width.mean - sqrt(Sepal.Width.var))
mean := b2 + b3 * (Sepal.Width.mean)
SD.above := b2 + b3 * (Sepal.Width.mean + sqrt(Sepal.Width.var))
'
semFit <- sem(model = model,
data = iris)
## Compare simple slopes
# From `emtrends`
test(
emtrends(semFit, ~ Sepal.Width, "Petal.Length",
lavaan.DV = "Sepal.Length",
cov.red = mean_sd)
)
# From lavaan
parameterEstimates(semFit, output = "pretty")[13:15, ]
# Identical slopes.
# SEs differ due to lavaan estimating uncertainty of the mean / SD
# of Sepal.Width, whereas emmeans uses the mean+-SD as is (fixed).
#### Latent DV ####
model <- '
LAT1 =~ Sepal.Length + Sepal.Width
LAT1 ~ b1 * Petal.Width + 1 * Petal.Length
Petal.Length ~ Petal.Length.mean * 1
V1 := 1 * Petal.Length.mean + 1 * b1
V2 := 1 * Petal.Length.mean + 2 * b1
'
semFit <- sem(model = model,
data = iris, std.lv = TRUE)
## Compare emmeans
# From emmeans
test(
emmeans(semFit, ~ Petal.Width,
lavaan.DV = "LAT1",
at = list(Petal.Width = 1:2))
)
# From lavaan
parameterEstimates(semFit, output = "pretty")[15:16, ]
# Identical means.
# SEs differ due to lavaan estimating uncertainty of the mean
# of Petal.Length, whereas emmeans uses the mean as is.
#### Multi-Variate DV ####
model <- '
ind60 =~ x1 + x2 + x3
# metric invariance
dem60 =~ y1 + a*y2 + b*y3 + c*y4
dem65 =~ y5 + a*y6 + b*y7 + c*y8
# scalar invariance
y1 + y5 ~ d*1
y2 + y6 ~ e*1
y3 + y7 ~ f*1
y4 + y8 ~ g*1
# regressions (slopes differ: interaction with time)
dem60 ~ b1*ind60
dem65 ~ b2*ind60 + NA*1 + Mean.Diff*1
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
# conditional mean differences (besides mean(ind60) == 0)
low := (-1*b2 + Mean.Diff) - (-1*b1) # 1 SD below M
high := (b2 + Mean.Diff) - b1 # 1 SD above M
'
semFit <- sem(model, data = PoliticalDemocracy)
## Compare contrasts
# From emmeans
emmeans(semFit, pairwise ~ rep.meas|ind60,
lavaan.DV = c("dem60","dem65"),
at = list(ind60 = c(-1,1)))[[2]]
# From lavaan
parameterEstimates(semFit, output = "pretty")[49:50, ]
#### Multi Group ####
model <- 'x1 ~ c(int1, int2)*1 + c(b1, b2)*ageyr
diff_11 := (int2 + b2*11) - (int1 + b1*11)
diff_13 := (int2 + b2*13) - (int1 + b1*13)
diff_15 := (int2 + b2*15) - (int1 + b1*15)
'
semFit <- sem(model, group = "school", data = HolzingerSwineford1939)
## Compare contrasts
# From emmeans (note `nesting = NULL`)
emmeans(semFit, pairwise ~ school | ageyr, lavaan.DV = "x1",
at = list(ageyr = c(11, 13, 15)), nesting = NULL)[[2]]
# From lavaan
parameterEstimates(semFit, output = "pretty")
#### Dealing with factors ####
warpbreaks <- cbind(warpbreaks,
model.matrix(~ wool + tension, data = warpbreaks))
model <- "
# Split for convenience
breaks ~ 1
breaks ~ woolB
breaks ~ tensionM + tensionH
breaks ~ woolB:tensionM + woolB:tensionH
"
semFit <- sem(model, warpbreaks)
## Compare contrasts
# From lm -> emmeans
lmFit <- lm(breaks ~ wool * tension, data = warpbreaks)
lmEM <- emmeans(lmFit, ~ tension + wool)
contrast(lmEM, method = data.frame(L_all = c(-1, .05, 0.5),
M_H = c(0, 1, -1)), by = "wool")
# From lavaan -> emmeans
lavEM <- emmeans(semFit, ~ tensionM + tensionH + woolB,
lavaan.DV = "breaks")
contrast(lavEM,
method = list(
"L_all|A" = c(c(-1, .05, 0.5, 0), rep(0, 4)),
"M_H |A" = c(c(0, 1, -1, 0), rep(0, 4)),
"L_all|A" = c(rep(0, 4), c(-1, .05, 0.5, 0)),
"M_H |A" = c(rep(0, 4), c(0, 1, -1, 0))
))
## End(Not run)
Find standardized factor loading from coefficient alpha
Description
Find standardized factor loading from coefficient alpha assuming that all items have equal loadings.
Usage
loadingFromAlpha(alpha, ni)
Arguments
alpha |
A desired coefficient alpha value. |
ni |
A desired number of items. |
Value
result |
The standardized factor loadings that make desired coefficient alpha with specified number of items. |
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Examples
loadingFromAlpha(0.8, 4)
Measurement Invariance Tests Within Person
Description
Testing measurement invariance across timepoints (longitudinal) or any context involving the use of the same scale in one case (e.g., a dyad case with husband and wife answering the same scale). The measurement invariance uses a typical sequence of model comparison tests. This function currently works with only one scale, and only with continuous indicators.
Usage
longInvariance(model, varList, auto = "all", constrainAuto = FALSE,
fixed.x = TRUE, std.lv = FALSE, group = NULL,
group.equal = "", group.partial = "", strict = FALSE,
warn = TRUE, debug = FALSE, quiet = FALSE,
fit.measures = "default", baseline.model = NULL,
method = "satorra.bentler.2001", ...)
Arguments
model |
lavaan syntax or parameter table |
varList |
A list containing indicator names of factors used in the invariance testing, such as the list that the first element is the vector of indicator names in the first timepoint and the second element is the vector of indicator names in the second timepoint. The order of indicator names should be the same (but measured in different times or different units). |
auto |
The order of autocorrelation on the measurement errors on the
similar items across factor (e.g., Item 1 in Time 1 and Time 2). If 0 is
specified, the autocorrelation will be not imposed. If 1 is specified,
the autocorrelation will imposed for the adjacent factor listed in
|
constrainAuto |
If |
fixed.x |
See |
std.lv |
See |
group |
See |
group.equal |
See |
group.partial |
See |
strict |
If |
warn |
See See |
debug |
See See |
quiet |
If |
fit.measures |
Fit measures used to calculate the differences between nested models. |
baseline.model |
custom baseline model passed to
|
method |
The method used to calculate likelihood ratio test. See
|
... |
Additional arguments in the |
Details
If strict = FALSE
, the following four models are tested in order:
Model 1: configural invariance. The same factor structure is imposed on all units.
Model 2: weak invariance. The factor loadings are constrained to be equal across units.
Model 3: strong invariance. The factor loadings and intercepts are constrained to be equal across units.
Model 4: The factor loadings, intercepts and means are constrained to be equal across units.
Each time a more restricted model is fitted, a \Delta\chi^2
test is
reported, comparing the current model with the previous one, and comparing
the current model to the baseline model (Model 1). In addition, the
difference in CFA is also reported (\Delta
CFI).
If strict = TRUE
, the following five models are tested in order:
Model 1: configural invariance. The same factor structure is imposed on all units.
Model 2: weak invariance. The factor loadings are constrained to be equal across units.
Model 3: strong invariance. The factor loadings and intercepts are constrained to be equal across units.
Model 4: strict invariance. The factor loadings, intercepts and residual variances are constrained to be equal across units.
Model 5: The factor loadings, intercepts, residual variances and means are constrained to be equal across units.
Note that if the \chi^2
test statistic is scaled (eg. a Satorra-Bentler
or Yuan-Bentler test statistic), a special version of the \Delta\chi^2
test is used as described in http://www.statmodel.com/chidiff.shtml
Value
Invisibly, all model fits in the sequence are returned as a list.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Yves Rosseel (Ghent University; Yves.Rosseel@UGent.be)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Vandenberg, R. J., and Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4–70. doi:10.1177/109442810031002
See Also
Examples
model <- ' f1t1 =~ y1t1 + y2t1 + y3t1
f1t2 =~ y1t2 + y2t2 + y3t2
f1t3 =~ y1t3 + y2t3 + y3t3 '
## Create list of variables
var1 <- c("y1t1", "y2t1", "y3t1")
var2 <- c("y1t2", "y2t2", "y3t2")
var3 <- c("y1t3", "y2t3", "y3t3")
constrainedVar <- list(var1, var2, var3)
## Invariance of the same factor across timepoints
longInvariance(model, auto = 1, constrainAuto = TRUE,
varList = constrainedVar, data = exLong)
## Invariance of the same factor across timepoints and groups
longInvariance(model, auto = 1, constrainAuto = TRUE,
varList = constrainedVar, data = exLong, group = "sex",
group.equal = c("loadings", "intercepts"))
Calculate Population Moments for Ordinal Data Treated as Numeric
Description
This function calculates ordinal-scale moments implied by LRV-scale moments
Usage
lrv2ord(Sigma, Mu, thresholds, cWts)
Arguments
Sigma |
Population covariance |
Mu |
Optional |
thresholds |
Either a single |
cWts |
Optional (default when missing is to use 0 for the lowest
category, followed by successive integers for each higher category).
Either a single |
Details
Binary and ordinal data are frequently accommodated in SEM by incorporating a threshold model that links each observed categorical response variable to a corresponding latent response variable that is typically assumed to be normally distributed (Kamata & Bauer, 2008; Wirth & Edwards, 2007). This function can be useful for real-data analysis or for designing Monte Carlo simulations, as described by Jorgensen and Johnson (2022).
Value
A list
including the LRV-scale population moments (means,
covariance matrix, correlation matrix, and thresholds), the category
weights, a data.frame
of implied univariate moments (means,
SDs, skewness, and excess kurtosis (i.e., in excess of 3, which is
the kurtosis of the normal distribution) for discretized data treated as
numeric
, and the implied covariance and correlation matrix of
discretized data treated as numeric
.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Andrew R. Johnson (Curtin University; andrew.johnson@curtin.edu.au)
References
Jorgensen, T. D., & Johnson, A. R. (2022). How to derive expected values of structural equation model parameters when treating discrete data as continuous. Structural Equation Modeling, 29(4), 639–650. doi:10.1080/10705511.2021.1988609
Kamata, A., & Bauer, D. J. (2008). A note on the relation between factor analytic and item response theory models. Structural Equation Modeling, 15(1), 136–153. doi:10.1080/10705510701758406
Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12(1), 58–79. doi:10.1037/1082-989X.12.1.58
Examples
## SCENARIO 1: DIRECTLY SPECIFY POPULATION PARAMETERS
## specify population model in LISREL matrices
Nu <- rep(0, 4)
Alpha <- c(1, -0.5)
Lambda <- matrix(c(1, 1, 0, 0, 0, 0, 1, 1), nrow = 4, ncol = 2,
dimnames = list(paste0("y", 1:4), paste0("eta", 1:2)))
Psi <- diag(c(1, .75))
Theta <- diag(4)
Beta <- matrix(c(0, .5, 0, 0), nrow = 2, ncol = 2)
## calculate model-implied population means and covariance matrix
## of latent response variables (LRVs)
IB <- solve(diag(2) - Beta) # to save time and space
Mu_LRV <- Nu + Lambda %*% IB %*% Alpha
Sigma_LRV <- Lambda %*% IB %*% Psi %*% t(IB) %*% t(Lambda) + Theta
## Specify (unstandardized) thresholds to discretize normally distributed data
## generated from Mu_LRV and Sigma_LRV, based on marginal probabilities
PiList <- list(y1 = c(.25, .5, .25),
y2 = c(.17, .33, .33, .17),
y3 = c(.1, .2, .4, .2, .1),
## make final variable highly asymmetric
y4 = c(.33, .25, .17, .12, .08, .05))
sapply(PiList, sum) # all sum to 100%
CumProbs <- sapply(PiList, cumsum)
## unstandardized thresholds
TauList <- mapply(qnorm, p = lapply(CumProbs, function(x) x[-length(x)]),
m = Mu_LRV, sd = sqrt(diag(Sigma_LRV)))
for (i in 1:4) names(TauList[[i]]) <- paste0(names(TauList)[i], "|t",
1:length(TauList[[i]]))
## assign numeric weights to each category (optional, see default)
NumCodes <- list(y1 = c(-0.5, 0, 0.5), y2 = 0:3, y3 = 1:5, y4 = 1:6)
## Calculate Population Moments for Numerically Coded Ordinal Variables
lrv2ord(Sigma = Sigma_LRV, Mu = Mu_LRV, thresholds = TauList, cWts = NumCodes)
## SCENARIO 2: USE ESTIMATED PARAMETERS AS POPULATION
data(datCat) # already stored as c("ordered","factor")
fit <- cfa(' f =~ 1*u1 + 1*u2 + 1*u3 + 1*u4 ', data = datCat)
lrv2ord(Sigma = fit, thresholds = fit) # use same fit for both
## or use estimated thresholds with specified parameters, but note that
## lrv2ord() will only extract standardized thresholds
dimnames(Sigma_LRV) <- list(paste0("u", 1:4), paste0("u", 1:4))
lrv2ord(Sigma = cov2cor(Sigma_LRV), thresholds = fit)
Finding Mardia's multivariate kurtosis
Description
Finding Mardia's multivariate kurtosis of multiple variables
Usage
mardiaKurtosis(dat, use = "everything")
Arguments
dat |
The target matrix or data frame with multiple variables |
use |
Missing data handling method from the |
Details
The Mardia's multivariate kurtosis formula (Mardia, 1970) is
b_{2, d} = \frac{1}{n}\sum^n_{i=1}\left[ \left(\bold{X}_i -
\bold{\bar{X}} \right)^{'} \bold{S}^{-1} \left(\bold{X}_i -
\bold{\bar{X}} \right) \right]^2,
where d
is the number of variables, X
is the target
dataset with multiple variables, n
is the sample size, \bold{S}
is the sample covariance matrix of the target dataset, and
\bold{\bar{X}}
is the mean vectors of the target dataset binded in
n
rows. When the population multivariate kurtosis is normal, the
b_{2,d}
is asymptotically distributed as normal distribution with the
mean of d(d + 2)
and variance of 8d(d + 2)/n
.
Value
A value of a Mardia's multivariate kurtosis with a test statistic
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3), 519–530. doi:10.2307/2334770
See Also
-
skew()
Find the univariate skewness of a variable -
kurtosis()
Find the univariate excessive kurtosis of a variable -
mardiaSkew()
Find the Mardia's multivariate skewness of a set of variables
Examples
library(lavaan)
mardiaKurtosis(HolzingerSwineford1939[ , paste0("x", 1:9)])
Finding Mardia's multivariate skewness
Description
Finding Mardia's multivariate skewness of multiple variables
Usage
mardiaSkew(dat, use = "everything")
Arguments
dat |
The target matrix or data frame with multiple variables |
use |
Missing data handling method from the |
Details
The Mardia's multivariate skewness formula (Mardia, 1970) is
b_{1, d} = \frac{1}{n^2}\sum^n_{i=1}\sum^n_{j=1}\left[
\left(\bold{X}_i - \bold{\bar{X}} \right)^{'} \bold{S}^{-1}
\left(\bold{X}_j - \bold{\bar{X}} \right) \right]^3,
where d
is the number of variables, X
is the target dataset
with multiple variables, n
is the sample size, \bold{S}
is
the sample covariance matrix of the target dataset, and \bold{\bar{X}}
is the mean vectors of the target dataset binded in n
rows.
When the population multivariate skewness is normal, the
\frac{n}{6}b_{1,d}
is asymptotically distributed as \chi^2
distribution with d(d + 1)(d + 2)/6
degrees of freedom.
Value
A value of a Mardia's multivariate skewness with a test statistic
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3), 519–530. doi:10.2307/2334770
See Also
-
skew()
Find the univariate skewness of a variable -
kurtosis()
Find the univariate excessive kurtosis of a variable -
mardiaKurtosis()
Find the Mardia's multivariate kurtosis of a set of variables
Examples
library(lavaan)
mardiaSkew(HolzingerSwineford1939[ , paste0("x", 1:9)])
Calculate maximal reliability
Description
Calculate maximal reliability of a scale
Usage
maximalRelia(object, omit.imps = c("no.conv", "no.se"))
Arguments
object |
A lavaan::lavaan or lavaan.mi::lavaan.mi object, expected to contain only exogenous common factors (i.e., a CFA model). |
omit.imps |
|
Details
Given that a composite score (W
) is a weighted sum of item scores:
W = \bold{w}^\prime \bold{x} ,
where \bold{x}
is a k \times 1
vector of the scores of each
item, \bold{w}
is a k \times 1
weight vector of each item, and
k
represents the number of items. Then, maximal reliability is
obtained by finding \bold{w}
such that reliability attains its maximum
(Li, 1997; Raykov, 2012). Note that the reliability can be obtained by
\rho = \frac{\bold{w}^\prime \bold{S}_T \bold{w}}{\bold{w}^\prime
\bold{S}_X \bold{w}}
where \bold{S}_T
is the covariance matrix explained by true scores and
\bold{S}_X
is the observed covariance matrix. Numerical method is used
to find \bold{w}
in this function.
For continuous items, \bold{S}_T
can be calculated by
\bold{S}_T = \Lambda \Psi \Lambda^\prime,
where \Lambda
is the factor loading matrix and \Psi
is the
covariance matrix among factors. \bold{S}_X
is directly obtained by
covariance among items.
For categorical items, Green and Yang's (2009) method is used for
calculating \bold{S}_T
and \bold{S}_X
. The element i
and
j
of \bold{S}_T
can be calculated by
\left[\bold{S}_T\right]_{ij} = \sum^{C_i - 1}_{c_i = 1} \sum^{C_j -
1}_{c_j - 1} \Phi_2\left( \tau_{x_{c_i}}, \tau_{x_{c_j}}, \left[ \Lambda
\Psi \Lambda^\prime \right]_{ij} \right) - \sum^{C_i - 1}_{c_i = 1}
\Phi_1(\tau_{x_{c_i}}) \sum^{C_j - 1}_{c_j - 1} \Phi_1(\tau_{x_{c_j}}),
where C_i
and C_j
represents the number of thresholds in Items
i
and j
, \tau_{x_{c_i}}
represents the threshold c_i
of Item i
, \tau_{x_{c_j}}
represents the threshold c_i
of
Item j
, \Phi_1(\tau_{x_{c_i}})
is the cumulative probability of
\tau_{x_{c_i}}
given a univariate standard normal cumulative
distribution and \Phi_2\left( \tau_{x_{c_i}}, \tau_{x_{c_j}}, \rho
\right)
is the joint cumulative probability of \tau_{x_{c_i}}
and
\tau_{x_{c_j}}
given a bivariate standard normal cumulative
distribution with a correlation of \rho
Each element of \bold{S}_X
can be calculated by
\left[\bold{S}_T\right]_{ij} = \sum^{C_i - 1}_{c_i = 1} \sum^{C_j -
1}_{c_j - 1} \Phi_2\left( \tau_{V_{c_i}}, \tau_{V_{c_j}}, \rho^*_{ij}
\right) - \sum^{C_i - 1}_{c_i = 1} \Phi_1(\tau_{V_{c_i}}) \sum^{C_j -
1}_{c_j - 1} \Phi_1(\tau_{V_{c_j}}),
where \rho^*_{ij}
is a polychoric correlation between Items i
and j
.
Value
Maximal reliability values of each group. The maximal-reliability
weights are also provided. Users may extracted the weighted by the
attr
function (see example below).
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Li, H. (1997). A unifying expression for the maximal reliability of a linear composite. Psychometrika, 62(2), 245–249. doi:10.1007/BF02295278
Raykov, T. (2012). Scale construction and development using structural equation modeling. In R. H. Hoyle (Ed.), Handbook of structural equation modeling (pp. 472–494). New York, NY: Guilford.
See Also
reliability()
for reliability of an unweighted
composite score
Examples
total <- 'f =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 '
fit <- cfa(total, data = HolzingerSwineford1939)
maximalRelia(fit)
# Extract the weight
mr <- maximalRelia(fit)
attr(mr, "weight")
Syntax for measurement equivalence
Description
Automatically generates lavaan
model syntax to specify a confirmatory
factor analysis (CFA) model with equality constraints imposed on
user-specified measurement (or structural) parameters. Optionally returns
the fitted model (if data are provided) representing some chosen level of
measurement equivalence/invariance across groups and/or repeated measures.
Usage
measEq.syntax(configural.model, ..., ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016", ID.thr = c(1L, 2L), group = NULL,
group.equal = "", group.partial = "", longFacNames = list(),
longIndNames = list(), long.equal = "", long.partial = "",
auto = "all", warn = TRUE, debug = FALSE, return.fit = FALSE)
Arguments
configural.model |
A model with no measurement-invariance constraints
(i.e., representing only configural invariance), unless required for model
identification.
Note that the specified or fitted model must not contain any latent structural parameters (i.e., it must be a CFA model), unless they are higher-order constructs with latent indicators (i.e., a second-order CFA). |
... |
Additional arguments (e.g., |
ID.fac |
See Kloessner & Klopp (2019) for details about all three methods. |
ID.cat |
See Details and References for more information. |
ID.thr |
|
group |
optional |
group.equal |
optional |
group.partial |
optional |
longFacNames |
optional named |
longIndNames |
optional named |
long.equal |
optional |
long.partial |
optional |
auto |
Used to automatically included autocorrelated measurement errors
among repeatedly measured indicators in |
warn , debug |
|
return.fit |
|
Details
This function is a pedagogical and analytical tool to generate model syntax representing some level of measurement equivalence/invariance across any combination of multiple groups and/or repeated measures. Support is provided for confirmatory factor analysis (CFA) models with simple or complex structure (i.e., cross-loadings and correlated residuals are allowed). For any complexities that exceed the limits of automation, this function is intended to still be useful by providing a means to generate syntax that users can easily edit to accommodate their unique situations.
Limited support is provided for bifactor models and higher-order constructs.
Because bifactor models have cross-loadings by definition, the option
ID.fac = "effects.code"
is unavailable. ID.fac = "UV"
is
recommended for bifactor models, but ID.fac = "UL"
is available on
the condition that each factor has a unique first indicator in the
configural.model
. In order to maintain generality, higher-order
factors may include a mix of manifest and latent indicators, but they must
therefore require ID.fac = "UL"
to avoid complications with
differentiating lower-order vs. higher-order (or mixed-level) factors.
The keyword "loadings"
in group.equal
or long.equal
constrains factor loadings of all manifest indicators (including loadings on
higher-order factors that also have latent indicators), whereas the keyword
"regressions"
constrains factor loadings of latent indicators. Users
can edit the model syntax manually to adjust constraints as necessary, or
clever use of the group.partial
or long.partial
arguments
could make it possible for users to still automated their model syntax.
The keyword "intercepts"
constrains the intercepts of all manifest
indicators, and the keyword "means"
constrains intercepts and means
of all latent common factors, regardless of whether they are latent
indicators of higher-order factors. To test equivalence of lower-order and
higher-order intercepts/means in separate steps, the user can either
manually edit their generated syntax or conscientiously exploit the
group.partial
or long.partial
arguments as necessary.
ID.fac
: If the configural.model
fixes any (e.g.,
the first) factor loadings, the generated syntax object will retain those
fixed values. This allows the user to retain additional constraints that
might be necessary (e.g., if there are only 1 or 2 indicators). Some methods
must be used in conjunction with other settings:
-
ID.cat = "Millsap"
requiresID.fac = "UL"
andparameterization = "theta"
. -
ID.cat = "LISREL"
requiresparameterization = "theta"
. -
ID.fac = "effects.code"
is unavailable when there are any cross-loadings.
ID.cat
: Wu & Estabrook (2016) recommended constraining
thresholds to equality first, and doing so should allow releasing any
identification constraints no longer needed. For each ordered
indicator, constraining one threshold to equality will allow the item's
intercepts to be estimated in all but the first group or repeated measure.
Constraining a second threshold (if applicable) will allow the item's
(residual) variance to be estimated in all but the first group or repeated
measure. For binary data, there is no independent test of threshold,
intercept, or residual-variance equality. Equivalence of thresholds must
also be assumed for three-category indicators. These guidelines provide the
least restrictive assumptions and tests, and are therefore the default.
The default setting in Mplus is similar to Wu & Estabrook (2016),
except that intercepts are always constrained to zero (so they are assumed
to be invariant without testing them). Millsap & Tein (2004) recommended
parameterization = "theta"
and identified an item's residual variance
in all but the first group (or occasion; Liu et al., 2017) by constraining
its intercept to zero and one of its thresholds to equality. A second
threshold for the reference indicator (so ID.fac = "UL"
) is used to
identify the common-factor means in all but the first group/occasion. The
LISREL software fixes the first threshold to zero and (if applicable) the
second threshold to 1, and assumes any remaining thresholds to be equal
across groups / repeated measures; thus, the intercepts are always
identified, and residual variances (parameterization = "theta"
) are
identified except for binary data, when they are all fixed to one.
Repeated Measures: If each repeatedly measured factor is measured
by the same indicators (specified in the same order in the
configural.model
) on each occasion, without any cross-loadings, the
user can let longIndNames
be automatically generated. Generic names
for the repeatedly measured indicators are created using the name of the
repeatedly measured factors (i.e., names(longFacNames)
) and the
number of indicators. So the repeatedly measured first indicator
("ind"
) of a longitudinal construct called "factor" would be
generated as "._factor_ind.1"
.
The same types of parameter can be specified for long.equal
as for
group.equal
(see lavaan::lavOptions()
for a list), except
for "residual.covariances"
or "lv.covariances"
. Instead, users
can constrain autocovariances using keywords "resid.autocov"
or "lv.autocov"
. Note that group.equal = "lv.covariances"
or
group.equal = "residual.covariances"
will constrain any
autocovariances across groups, along with any other covariances the user
specified in the configural.model
. Note also that autocovariances
cannot be specified as exceptions in long.partial
, so anything more
complex than the auto
argument automatically provides should instead
be manually specified in the configural.model
.
When users set orthogonal=TRUE
in the configural.model
(e.g.,
in bifactor models of repeatedly measured constructs), autocovariances of
each repeatedly measured factor will still be freely estimated in the
generated syntax.
Missing Data: If users wish to utilize the auxiliary()
function to automatically include auxiliary variables in conjunction with
missing = "FIML"
, they should first generate the hypothesized-model
syntax, then submit that syntax as the model to auxiliary()
.
If users utilized lavaan.mi::lavaan.mi()
to fit their configural.model
to multiply imputed data, that model can also be passed to the
configural.model
argument, and if return.fit = TRUE
, the
generated model will be fitted to the multiple imputations.
Value
By default, an object of class measEq.syntax.
If return.fit = TRUE
, a fitted lavaan::lavaan()
model, with the measEq.syntax
object stored in the
@external
slot, accessible by fit@external$measEq.syntax
.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Kloessner, S., & Klopp, E. (2019). Explaining constraint interaction: How to interpret estimated model parameters under alternative scaling methods. Structural Equation Modeling, 26(1), 143–155. doi:10.1080/10705511.2018.1517356
Liu, Y., Millsap, R. E., West, S. G., Tein, J.-Y., Tanaka, R., & Grimm, K. J. (2017). Testing measurement invariance in longitudinal data with ordered-categorical measures. Psychological Methods, 22(3), 486–506. doi:10.1037/met0000075
Millsap, R. E., & Tein, J.-Y. (2004). Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39(3), 479–515. doi:10.1207/S15327906MBR3903_4
Wu, H., & Estabrook, R. (2016). Identification of confirmatory factor analysis models of different levels of invariance for ordered categorical outcomes. Psychometrika, 81(4), 1014–1045. doi:10.1007/s11336-016-9506-0
See Also
Examples
mod.cat <- ' FU1 =~ u1 + u2 + u3 + u4
FU2 =~ u5 + u6 + u7 + u8 '
## the 2 factors are actually the same factor (FU) measured twice
longFacNames <- list(FU = c("FU1","FU2"))
## CONFIGURAL model: no constraints across groups or repeated measures
syntax.config <- measEq.syntax(configural.model = mod.cat,
# NOTE: data provides info about numbers of
# groups and thresholds
data = datCat,
ordered = paste0("u", 1:8),
parameterization = "theta",
ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
group = "g", longFacNames = longFacNames)
## print lavaan syntax to the Console
cat(as.character(syntax.config))
## print a summary of model features
summary(syntax.config)
## THRESHOLD invariance:
## only necessary to specify thresholds if you have no data
mod.th <- '
u1 | t1 + t2 + t3 + t4
u2 | t1 + t2 + t3 + t4
u3 | t1 + t2 + t3 + t4
u4 | t1 + t2 + t3 + t4
u5 | t1 + t2 + t3 + t4
u6 | t1 + t2 + t3 + t4
u7 | t1 + t2 + t3 + t4
u8 | t1 + t2 + t3 + t4
'
syntax.thresh <- measEq.syntax(configural.model = c(mod.cat, mod.th),
# NOTE: data not provided, so syntax must
# include thresholds, and number of
# groups == 2 is indicated by:
sample.nobs = c(1, 1),
parameterization = "theta",
ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
group = "g", group.equal = "thresholds",
longFacNames = longFacNames,
long.equal = "thresholds")
## notice that constraining 4 thresholds allows intercepts and residual
## variances to be freely estimated in all but the first group & occasion
cat(as.character(syntax.thresh))
## print a summary of model features
summary(syntax.thresh)
## Fit a model to the data either in a subsequent step (recommended):
mod.config <- as.character(syntax.config)
fit.config <- cfa(mod.config, data = datCat, group = "g",
ordered = paste0("u", 1:8), parameterization = "theta")
## or in a single step (not generally recommended):
fit.thresh <- measEq.syntax(configural.model = mod.cat, data = datCat,
ordered = paste0("u", 1:8),
parameterization = "theta",
ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
group = "g", group.equal = "thresholds",
longFacNames = longFacNames,
long.equal = "thresholds", return.fit = TRUE)
## compare their fit to test threshold invariance
anova(fit.config, fit.thresh)
## --------------------------------------------------------
## RECOMMENDED PRACTICE: fit one invariance model at a time
## --------------------------------------------------------
## - A downside of setting return.fit=TRUE is that if the model has trouble
## converging, you don't have the opportunity to investigate the syntax,
## or even to know whether an error resulted from the syntax-generator or
## from lavaan itself.
## - A downside of automatically fitting an entire set of invariance models
## (like the old measurementInvariance() function did) is that you might
## end up testing models that shouldn't even be fitted because less
## restrictive models already fail (e.g., don't test full scalar
## invariance if metric invariance fails! Establish partial metric
## invariance first, then test equivalent of intercepts ONLY among the
## indicators that have invariate loadings.)
## The recommended sequence is to (1) generate and save each syntax object,
## (2) print it to the screen to verify you are fitting the model you expect
## to (and potentially learn which identification constraints should be
## released when equality constraints are imposed), and (3) fit that model
## to the data, as you would if you had written the syntax yourself.
## Continuing from the examples above, after establishing invariance of
## thresholds, we proceed to test equivalence of loadings and intercepts
## (metric and scalar invariance, respectively)
## simultaneously across groups and repeated measures.
## metric invariance
syntax.metric <- measEq.syntax(configural.model = mod.cat, data = datCat,
ordered = paste0("u", 1:8),
parameterization = "theta",
ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
group = "g", longFacNames = longFacNames,
group.equal = c("thresholds","loadings"),
long.equal = c("thresholds","loadings"))
summary(syntax.metric) # summarize model features
mod.metric <- as.character(syntax.metric) # save as text
cat(mod.metric) # print/view lavaan syntax
## fit model to data
fit.metric <- cfa(mod.metric, data = datCat, group = "g",
ordered = paste0("u", 1:8), parameterization = "theta")
## test equivalence of loadings, given equivalence of thresholds
anova(fit.thresh, fit.metric)
## scalar invariance
syntax.scalar <- measEq.syntax(configural.model = mod.cat, data = datCat,
ordered = paste0("u", 1:8),
parameterization = "theta",
ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
group = "g", longFacNames = longFacNames,
group.equal = c("thresholds","loadings",
"intercepts"),
long.equal = c("thresholds","loadings",
"intercepts"))
summary(syntax.scalar) # summarize model features
mod.scalar <- as.character(syntax.scalar) # save as text
cat(mod.scalar) # print/view lavaan syntax
## fit model to data
fit.scalar <- cfa(mod.scalar, data = datCat, group = "g",
ordered = paste0("u", 1:8), parameterization = "theta")
## test equivalence of intercepts, given equal thresholds & loadings
anova(fit.metric, fit.scalar)
## For a single table with all results, you can pass the models to
## summarize to the compareFit() function
Comparisons <- compareFit(fit.config, fit.thresh, fit.metric, fit.scalar)
summary(Comparisons)
## ------------------------------------------------------
## NOT RECOMMENDED: fit several invariance models at once
## ------------------------------------------------------
test.seq <- c("thresholds","loadings","intercepts","means","residuals")
meq.list <- list()
for (i in 0:length(test.seq)) {
if (i == 0L) {
meq.label <- "configural"
group.equal <- ""
long.equal <- ""
} else {
meq.label <- test.seq[i]
group.equal <- test.seq[1:i]
long.equal <- test.seq[1:i]
}
meq.list[[meq.label]] <- measEq.syntax(configural.model = mod.cat,
data = datCat,
ordered = paste0("u", 1:8),
parameterization = "theta",
ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016",
group = "g",
group.equal = group.equal,
longFacNames = longFacNames,
long.equal = long.equal,
return.fit = TRUE)
}
evalMeasEq <- compareFit(meq.list)
summary(evalMeasEq)
## -----------------
## Binary indicators
## -----------------
## borrow example data from Mplus user guide
myData <- read.table("http://www.statmodel.com/usersguide/chap5/ex5.16.dat")
names(myData) <- c("u1","u2","u3","u4","u5","u6","x1","x2","x3","g")
bin.mod <- '
FU1 =~ u1 + u2 + u3
FU2 =~ u4 + u5 + u6
'
## Must SIMULTANEOUSLY constrain thresholds, loadings, and intercepts
test.seq <- list(strong = c("thresholds","loadings","intercepts"),
means = "means",
strict = "residuals")
meq.list <- list()
for (i in 0:length(test.seq)) {
if (i == 0L) {
meq.label <- "configural"
group.equal <- ""
long.equal <- ""
} else {
meq.label <- names(test.seq)[i]
group.equal <- unlist(test.seq[1:i])
# long.equal <- unlist(test.seq[1:i])
}
meq.list[[meq.label]] <- measEq.syntax(configural.model = bin.mod,
data = myData,
ordered = paste0("u", 1:6),
parameterization = "theta",
ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016",
group = "g",
group.equal = group.equal,
#longFacNames = longFacNames,
#long.equal = long.equal,
return.fit = TRUE)
}
evalMeasEq <- compareFit(meq.list)
summary(evalMeasEq)
## ---------------------
## Multilevel Invariance
## ---------------------
## To test invariance across levels in a MLSEM, specify syntax as though
## you are fitting to 2 groups instead of 2 levels.
mlsem <- ' f1 =~ y1 + y2 + y3
f2 =~ y4 + y5 + y6 '
## metric invariance
syntax.metric <- measEq.syntax(configural.model = mlsem, meanstructure = TRUE,
ID.fac = "std.lv", sample.nobs = c(1, 1),
group = "cluster", group.equal = "loadings")
## by definition, Level-1 means must be zero, so fix them
syntax.metric <- update(syntax.metric,
change.syntax = paste0("y", 1:6, " ~ c(0, NA)*1"))
## save as a character string
mod.metric <- as.character(syntax.metric, groups.as.blocks = TRUE)
## convert from multigroup to multilevel
mod.metric <- gsub(pattern = "group:", replacement = "level:",
x = mod.metric, fixed = TRUE)
## fit model to data
fit.metric <- lavaan(mod.metric, data = Demo.twolevel, cluster = "cluster")
summary(fit.metric)
Class for Representing a Measurement-Equivalence Model
Description
This class of object stores information used to automatically generate
lavaan model syntax to represent user-specified levels of measurement
equivalence/invariance across groups and/or repeated measures. See
measEq.syntax()
for details.
Usage
## S4 method for signature 'measEq.syntax'
as.character(x, package = "lavaan",
params = NULL, single = TRUE, groups.as.blocks = FALSE)
## S4 method for signature 'measEq.syntax'
show(object)
## S4 method for signature 'measEq.syntax'
summary(object, verbose = TRUE)
## S4 method for signature 'measEq.syntax'
update(object, ..., evaluate = TRUE,
change.syntax = NULL)
Arguments
x , object |
an object of class |
package |
|
params |
|
single |
|
groups.as.blocks |
|
verbose |
|
... |
Additional arguments to the |
evaluate |
If |
change.syntax |
|
Value
summary |
|
show |
|
update |
|
as.character |
|
Slots
package
character
indicating the software package used to represent the model. Currently, only"lavaan"
is available, which uses the LISREL representation (seelavaan::lavOptions()
). In the future,"OpenMx"
may become available, using RAM representation.model.type
character
. Currently, only "cfa" is available. Future versions may allow for MIMIC / RFA models, where invariance can be tested across levels of exogenous variables explicitly included as predictors of indicators, controlling for their effects on (or correlation with) the common factors.call
The function call as returned by
match.call()
, with some arguments updated if necessary for logical consistency.meanstructure
logical
indicating whether a mean structure is included in the model.numeric
character
vector namingnumeric
manifest indicators.ordered
character
vector namingordered
indicators.parameterization
character
. Seelavaan::lavOptions()
.specify
list
of parameter matrices, similar in form to the output oflavInspect(fit, "free")
. These matrices arelogical
, indicating whether each parameter should be specified in the model syntax.values
list
of parameter matrices, similar in form to the output oflavInspect(fit, "free")
. These matrices arenumeric
, indicating whether each parameter should be freely estimated (indicated byNA
) or fixed to a particular value.labels
list
of parameter matrices, similar in form to the output oflavInspect(fit, "free")
. These matrices containcharacter
labels used to constrain parameters to equality.constraints
character
vector containing additional equality constraints used to identify the model whenID.fac = "fx"
.ngroups
integer
indicating the number of groups.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Examples
## See ?measEq.syntax help page for examples using lavaan
Measurement Invariance Tests
Description
Testing measurement invariance across groups using a typical sequence of model comparison tests.
Usage
measurementInvariance(..., std.lv = FALSE, strict = FALSE, quiet = FALSE,
fit.measures = "default", baseline.model = NULL,
method = "satorra.bentler.2001")
Arguments
... |
The same arguments as for any lavaan model. See
|
std.lv |
If |
strict |
If |
quiet |
If |
fit.measures |
Fit measures used to calculate the differences between nested models. |
baseline.model |
custom baseline model passed to
|
method |
The method used to calculate likelihood ratio test. See
|
Details
If strict = FALSE
, the following four models are tested in order:
Model 1: configural invariance. The same factor structure is imposed on all groups.
Model 2: weak invariance. The factor loadings are constrained to be equal across groups.
Model 3: strong invariance. The factor loadings and intercepts are constrained to be equal across groups.
Model 4: The factor loadings, intercepts and means are constrained to be equal across groups.
Each time a more restricted model is fitted, a \Delta\chi^2
test is
reported, comparing the current model with the previous one, and comparing
the current model to the baseline model (Model 1). In addition, the
difference in CFI is also reported (\Delta
CFI).
If strict = TRUE
, the following five models are tested in order:
Model 1: configural invariance. The same factor structure is imposed on all groups.
Model 2: weak invariance. The factor loadings are constrained to be equal across groups.
Model 3: strong invariance. The factor loadings and intercepts are constrained to be equal across groups.
Model 4: strict invariance. The factor loadings, intercepts and residual variances are constrained to be equal across groups.
Model 5: The factor loadings, intercepts, residual variances and means are constrained to be equal across groups.
Note that if the \chi^2
test statistic is scaled (e.g., a Satorra-Bentler
or Yuan-Bentler test statistic), a special version of the \Delta\chi^2
test is used as described in http://www.statmodel.com/chidiff.shtml
Value
Invisibly, all model fits in the sequence are returned as a list.
Author(s)
Yves Rosseel (Ghent University; Yves.Rosseel@UGent.be)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Vandenberg, R. J., and Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4–70.
See Also
Examples
HW.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
measurementInvariance(model = HW.model, data = HolzingerSwineford1939,
group = "school", fit.measures = c("cfi","aic"))
Measurement Invariance Tests for Categorical Items
Description
Testing measurement invariance across groups using a typical sequence of model comparison tests.
Usage
measurementInvarianceCat(..., std.lv = FALSE, strict = FALSE,
quiet = FALSE, fit.measures = "default",
baseline.model = NULL, method = "default")
Arguments
... |
The same arguments as for any lavaan model. See
|
std.lv |
If |
strict |
If |
quiet |
If |
fit.measures |
Fit measures used to calculate the differences between nested models. |
baseline.model |
custom baseline model passed to
|
method |
The method used to calculate likelihood ratio test. See
|
Details
Theta parameterization is used to represent SEM for categorical items. That is, residual variances are modeled instead of the total variance of underlying normal variate for each item. Five models can be tested based on different constraints across groups.
Model 1: configural invariance. The same factor structure is imposed on all groups.
Model 2: weak invariance. The factor loadings are constrained to be equal across groups.
Model 3: strong invariance. The factor loadings and thresholds are constrained to be equal across groups.
Model 4: strict invariance. The factor loadings, thresholds and residual variances are constrained to be equal across groups. For categorical variables, all residual variances are fixed as 1.
Model 5: The factor loadings, threshoulds, residual variances and means are constrained to be equal across groups.
However, if all items have two items (dichotomous), scalar invariance and
weak invariance cannot be separated because thresholds need to be equal
across groups for scale identification. Users can specify strict
option to include the strict invariance model for the invariance testing.
See the further details of scale identification and different
parameterization in Millsap and Yun-Tein (2004).
Value
Invisibly, all model fits in the sequence are returned as a list.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Yves Rosseel (Ghent University; Yves.Rosseel@UGent.be)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Millsap, R. E., & Yun-Tein, J. (2004). Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39(3), 479–515. doi:10.1207/S15327906MBR3903_4
See Also
Examples
syntax <- ' f1 =~ u1 + u2 + u3 + u4'
measurementInvarianceCat(model = syntax, data = datCat, group = "g",
parameterization = "theta", estimator = "wlsmv",
ordered = c("u1", "u2", "u3", "u4"))
Modification indices and their power approach for model fit evaluation
Description
The model fit evaluation approach using modification indices and expected parameter changes.
Usage
miPowerFit(lavaanObj, stdLoad = 0.4, cor = 0.1, stdBeta = 0.1,
intcept = 0.2, stdDelta = NULL, delta = NULL, cilevel = 0.9, ...)
Arguments
lavaanObj |
The lavaan model object used to evaluate model fit |
stdLoad |
The amount of standardized factor loading that one would like to be detected (rejected). The default value is 0.4, which is suggested by Saris and colleagues (2009, p. 571). |
cor |
The amount of factor or error correlations that one would like to be detected (rejected). The default value is 0.1, which is suggested by Saris and colleagues (2009, p. 571). |
stdBeta |
The amount of standardized regression coefficients that one would like to be detected (rejected). The default value is 0.1, which is suggested by Saris and colleagues (2009, p. 571). |
intcept |
The amount of standardized intercept (similar to Cohen's d that one would like to be detected (rejected). The default value is 0.2, which is equivalent to a low effect size proposed by Cohen (1988, 1992). |
stdDelta |
The vector of the standardized parameters that one would
like to be detected (rejected). If this argument is specified, the value
here will overwrite the other arguments above. The order of the vector
must be the same as the row order from modification indices from the
|
delta |
The vector of the unstandardized parameters that one would like
to be detected (rejected). If this argument is specified, the value here
will overwrite the other arguments above. The order of the vector must be
the same as the row order from modification indices from the |
cilevel |
The confidence level of the confidence interval of expected parameter changes. The confidence intervals are used in the equivalence testing. |
... |
arguments passed to |
Details
To decide whether a parameter should be freed, one can inspect its modification index (MI) and expected parameter change (EPC). Those values can be used to evaluate model fit by 2 methods.
Method 1: Saris, Satorra, and van der Veld (2009, pp. 570–573) used power (probability of detecting a significant MI) and EPC to decide whether to free a parametr. First, one should evaluate whether a parameter's MI is significant. Second, one should evaluate whether the power to detect a target EPC is high enough. The combination of criteria leads to the so-called "JRule" first implemented with LISREL (van der Veld et al., 2008):
If the MI is not significant and the power is low, the test is inconclusive.
If the MI is not significant and the power is high, there is no misspecification.
If the MI is significant and the power is low, the fixed parameter is misspecified.
If the MI is significant and the power is high, the EPC is investigated. If the EPC is large (greater than the the target EPC), the parameter is misspecified. If the EPC is low (lower than the target EPC), the parameter is not misspecificied.
Method 2: The confidence interval (CI) of an EPC is calculated.
These CIs are compared with the range of trivial
misspecification, which could be (-delta
, delta
) or (0,
delta
) for nonnegative parameters.
If a CI overlaps with the range of trivial misspecification, the test is inconclusive.
If a CI completely exceeds the range of trivial misspecification, the fixed parameters are severely misspecified.
If a CI is completely within the range of trivial misspecification, the fixed parameters are trivially misspecified.
Value
A data frame with these variables:
-
lhs
: The left-hand side variable, with respect to the operator in in the lavaanlavaan::model.syntax()
-
op
: The lavaan syntax operator: "~~" represents covariance, "=~" represents factor loading, "~" represents regression, and "~1" represents intercept. -
rhs
: The right-hand side variable -
group
: The level of the group variable for the parameter in question -
mi
: The modification index of the fixed parameter -
epc
: The EPC if the parameter is freely estimated -
target.epc
: The target EPC that represents the minimum size of misspecification that one would like to be detected by the test with a high power -
std.epc
: The standardized EPC if the parameter is freely estimated -
std.target.epc
: The standardized target expected parameter change -
significant.mi
: Represents whether the modification index value is significant -
high.power
: Represents whether the power is enough to detect the target expected parameter change -
decision.pow
: The decision whether the parameter is misspecified or not based on Saris et al's method:"M"
represents the parameter is misspecified,"NM"
represents the parameter is not misspecified,"EPC:M"
represents the parameter is misspecified decided by checking the expected parameter change value,"EPC:NM"
represents the parameter is not misspecified decided by checking the expected parameter change value, and"I"
represents the decision is inconclusive. -
se.epc
: The standard errors of the expected parameter changes. -
lower.epc
: The lower bound of the confidence interval of expected parameter changes. -
upper.epc
: The upper bound of the confidence interval of expected parameter changes. -
lower.std.epc
: Lower confidence limit of standardized EPCs -
upper.std.epc
: Upper confidence limit of standardized EPCs -
decision.ci
: Decision whether the parameter is misspecified based on the CI method:"M"
represents the parameter is misspecified,"NM"
represents the parameter is not misspecified, and"I"
represents the decision is inconclusive.
The row numbers matches with the results obtained from the
inspect(object, "mi")
function.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. doi:10.1037/0033-2909.112.1.155
Saris, W. E., Satorra, A., & van der Veld, W. M. (2009). Testing structural equation models or detection of misspecifications? Structural Equation Modeling, 16(4), 561–582. doi:10.1080/10705510903203433
van der Veld, W. M., Saris, W. E., & Satorra, A. (2008). JRule 3.0 Users Guide. doi:10.13140/RG.2.2.13609.90729
See Also
moreFitIndices()
For the additional fit indices
information
Examples
library(lavaan)
HS.model <- ' visual =~ x1 + x2 + x3 '
fit <- cfa(HS.model, data = HolzingerSwineford1939,
group = "sex", group.equal = c("loadings","intercepts"))
miPowerFit(fit, free.remove = FALSE, op = "=~") # loadings
miPowerFit(fit, free.remove = FALSE, op = "~1") # intercepts
model <- '
# latent variable definitions
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + a*y2 + b*y3 + c*y4
dem65 =~ y5 + a*y6 + b*y7 + c*y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
fit2 <- sem(model, data = PoliticalDemocracy, meanstructure = TRUE)
miPowerFit(fit2, stdLoad = 0.3, cor = 0.2, stdBeta = 0.2, intcept = 0.5)
Modification Indices for Multiple Imputations
Description
Modification indices (1-df Lagrange multiplier tests) from a latent variable model fitted to multiple imputed data sets. Statistics for releasing one or more fixed or constrained parameters in model can be calculated by pooling the gradient and information matrices across imputed data sets in a method proposed by Mansolf, Jorgensen, & Enders (2020)—analogous to the "D1" Wald test proposed by Li, Meng, Raghunathan, & Rubin (1991)—or by pooling the complete-data score-test statistics across imputed data sets (i.e., "D2"; Li et al., 1991).
Usage
modindices.mi(object, test = c("D2","D1"), omit.imps = c("no.conv","no.se"),
standardized = TRUE, cov.std = TRUE, information = "expected",
power = FALSE, delta = 0.1, alpha = 0.05, high.power = 0.75,
sort. = FALSE, minimum.value = 0, maximum.number = nrow(LIST),
na.remove = TRUE, op = NULL)
modificationIndices.mi(object, test = c("D2","D1"),
omit.imps = c("no.conv","no.se"),
standardized = TRUE, cov.std = TRUE,
information = "expected", power = FALSE, delta = 0.1,
alpha = 0.05, high.power = 0.75, sort. = FALSE,
minimum.value = 0, maximum.number = nrow(LIST),
na.remove = TRUE, op = NULL)
modificationindices.mi(object, test = c("D2","D1"),
omit.imps = c("no.conv","no.se"),
standardized = TRUE, cov.std = TRUE,
information = "expected", power = FALSE, delta = 0.1,
alpha = 0.05, high.power = 0.75, sort. = FALSE,
minimum.value = 0, maximum.number = nrow(LIST),
na.remove = TRUE, op = NULL)
Arguments
object |
An object of class OLDlavaan.mi |
test |
|
omit.imps |
|
standardized |
|
cov.std |
|
information |
|
power |
|
delta |
The value of the effect size, as used in the post-hoc power
computation, currently using the unstandardized metric of the |
alpha |
The significance level used for deciding if the modification index is statistically significant or not. |
high.power |
If the computed power is higher than this cutoff value,
the power is considered 'high'. If not, the power is considered 'low'.
This affects the values in the |
sort. |
|
minimum.value |
|
maximum.number |
|
na.remove |
|
op |
|
Value
A data.frame
containing modification indices and (S)EPCs.
Note
When test = "D2"
, each (S)EPC will be pooled by taking its
average across imputations. When test = "D1"
, EPCs will be
calculated in the standard way using the pooled gradient and information,
and SEPCs will be calculated by standardizing the EPCs using model-implied
(residual) variances.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Adapted from lavaan source code, written by Yves Rosseel (Ghent University; Yves.Rosseel@UGent.be)
test = "D1"
method proposed by
Maxwell Mansolf (University of California, Los Angeles;
mamansolf@gmail.com)
References
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data.Statistica Sinica, 1(1), 65–92. Retrieved from https://www.jstor.org/stable/24303994
Mansolf, M., Jorgensen, T. D., & Enders, C. K. (2020). A multiple imputation score test for model modification in structural equation models. Psychological Methods, 25(4), 393–411. doi:10.1037/met0000243
See Also
Examples
## See the new lavaan.mi package
Monte Carlo Confidence Intervals to Test Functions of Parameter Estimates
Description
Robust confidence intervals for functions of parameter estimates, based on empirical sampling distributions of estimated model parameters.
Usage
monteCarloCI(object = NULL, expr, coefs, ACM, nRep = 20000,
standardized = FALSE, fast = TRUE, level = 0.95, na.rm = TRUE,
append.samples = FALSE, plot = FALSE,
ask = getOption("device.ask.default"), ...)
Arguments
object |
A object of class lavaan::lavaan in which
functions of parameters have already been defined using the |
expr |
Optional |
coefs |
|
ACM |
Symmetric |
nRep |
|
standardized |
|
fast |
|
level |
|
na.rm |
|
append.samples |
|
plot |
|
ask |
whether to prompt user before printing each plot |
... |
arguments passed to |
Details
This function implements the Monte Carlo method of obtaining an empirical
sampling distribution of estimated model parameters, as described by
MacKinnon et al. (2004) for testing indirect effects in mediation models.
This is essentially a parametric bootstrap method, which (re)samples
parameters (rather than raw data) from a multivariate-normal distribution
with mean vector equal to estimates in coef()
and covariance matrix
equal to the asymptotic covariance matrix vcov()
of estimated parameters.
The easiest way to use the function is to fit a SEM to data with
lavaan::lavaan()
, using the :=
operator in the
lavaan::model.syntax()
to specify user-defined parameters.
All information is then available in the resulting
lavaan::lavaan object. Alternatively (especially when using
external SEM software to fit the model), the expression(s) can be explicitly
passed to the function, along with the vector of estimated model parameters
and their associated asymptotic sampling covariance matrix (ACOV).
For further information on the Monte Carlo method, see MacKinnon et al.
(2004) and Preacher & Selig (2012).
The asymptotic covariance matrix can be obtained easily from many popular SEM software packages.
LISREL: Including the EC option on the OU line will print the ACM to a seperate file. The file contains the lower triangular elements of the ACM in free format and scientific notation.
Mplus: Include the command TECH3; in the OUTPUT section. The ACM will be printed in the output.
lavaan
: Use thevcov()
method on the fitted lavaan::lavaan object to return the ACM.
Value
A lavaan.data.frame
(to use lavaan's print
method)
with point estimates and confidence limits of each requested function of
parameters in expr
is returned. If append.samples = TRUE
,
output will be a list
with the same $Results
along with a
second data.frame
with the $Samples
(in rows) of each
parameter (in columns), and an additional column for each requested
function of those parameters.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods. Multivariate Behavioral Research, 39(1) 99–128. doi:10.1207/s15327906mbr3901_4
Preacher, K. J., & Selig, J. P. (2010, July). Monte Carlo method for assessing multilevel mediation: An interactive tool for creating confidence intervals for indirect effects in 1-1-1 multilevel models. Computer software available from http://quantpsy.org/.
Preacher, K. J., & Selig, J. P. (2012). Advantages of Monte Carlo confidence intervals for indirect effects. Communication Methods and Measures, 6(2), 77–98. doi:10.1080/19312458.2012.679848
Selig, J. P., & Preacher, K. J. (2008, June). Monte Carlo method for assessing mediation: An interactive tool for creating confidence intervals for indirect effects. Computer software available from http://quantpsy.org/.
Examples
## From the mediation tutorial:
## http://lavaan.ugent.be/tutorial/mediation.html
set.seed(1234)
X <- rnorm(100)
M <- 0.5*X + rnorm(100)
Y <- 0.7*M + rnorm(100)
dat <- data.frame(X = X, Y = Y, M = M)
mod <- ' # direct effect
Y ~ c*X
# mediator
M ~ a*X
Y ~ b*M
# indirect effect (a*b)
ind := a*b
# total effect
total := ind + c
'
fit <- sem(mod, data = dat)
summary(fit, ci = TRUE) # print delta-method CIs
## Automatically extract information from lavaan object
set.seed(1234)
monteCarloCI(fit) # CIs more robust than delta method in smaller samples
## delta method for standardized solution
standardizedSolution(fit)
## compare to Monte Carlo CIs:
set.seed(1234)
monteCarloCI(fit, standardized = TRUE)
## save samples to calculate more precise intervals:
set.seed(1234)
foo <- monteCarloCI(fit, append.samples = TRUE)
# library(HDInterval) # not a dependency; must be installed
# hdi(foo$Samples)
## Parameters can also be obtained from an external analysis
myParams <- c("a","b","c")
(coefs <- coef(fit)[myParams]) # names must match those in the "expression"
## Asymptotic covariance matrix from an external analysis
(AsyCovMat <- vcov(fit)[myParams, myParams])
## Compute CI, include a plot
set.seed(1234)
monteCarloCI(expr = c(ind = 'a*b', total = 'ind + c',
## other arbitrary functions are also possible
meaningless = 'sqrt(a)^b / log(abs(c))'),
coefs = coefs, ACM = AsyCovMat,
plot = TRUE, ask = TRUE) # print a plot for each
Calculate more fit indices
Description
Calculate more fit indices that are not already provided in lavaan.
Usage
moreFitIndices(object, fit.measures = "all", nPrior = 1)
Arguments
object |
The lavaan model object provided after running the |
fit.measures |
Additional fit measures to be calculated. All additional fit measures are calculated by default |
nPrior |
The sample size on which prior is based. This argument is used
to compute |
Details
See nullRMSEA()
for the further details of the computation of
RMSEA of the null model.
Gamma-Hat (gammaHat
; West, Taylor, & Wu, 2012) is a global
goodness-of-fit index which can be computed (assuming equal number of
indicators across groups) by
\hat{\Gamma} =\frac{p}{p + 2 \times \frac{\chi^{2}_{k} - df_{k}}{N}},
where p
is the number of variables in the model, \chi^{2}_{k}
is
the \chi^2
test statistic value of the target model, df_{k}
is
the degree of freedom when fitting the target model, and N
is the
sample size (or sample size minus the number of groups if mimic
is
set to "EQS"
).
Adjusted Gamma-Hat (adjGammaHat
; West, Taylor, & Wu, 2012) is a
global fit index which can be computed by
\hat{\Gamma}_\textrm{adj} = \left(1 - \frac{K \times p \times
(p + 1)}{2 \times df_{k}} \right) \times \left( 1 - \hat{\Gamma} \right),
where K
is the number of groups (please refer to Dudgeon, 2004, for
the multiple-group adjustment for adjGammaHat
).
Note that if Satorra–Bentler's or Yuan–Bentler's method is used, the fit
indices using the scaled \chi^2
values are also provided.
The remaining indices are information criteria calculated using the
object
's -2 \times
log-likelihood, abbreviated -2LL
.
Corrected Akaike Information Criterion (aic.smallN
; Burnham &
Anderson, 2003) is a corrected version of AIC for small sample size, often
abbreviated AICc:
\textrm{AIC}_{\textrm{small}-N} = AIC + \frac{2q(q + 1)}{N - q - 1},
where AIC
is the original AIC: -2LL + 2q
(where q
= the number of estimated parameters in the target model). Note that AICc is
a small-sample correction derived for univariate regression models, so it is
probably not appropriate for comparing SEMs.
Corrected Bayesian Information Criterion (bic.priorN
; Kuha, 2004) is
similar to BIC but explicitly specifying the sample size on which the prior
is based (N_{prior}
) using the nPrior
argument.
\textrm{BIC}_{\textrm{prior}-N} = -2LL + q\log{( 1 + \frac{N}{N_{prior}} )}.
Bollen et al. (2012, 2014) discussed additional BICs that incorporate more
terms from a Taylor series expansion, which the standard BIC drops. The
"Scaled Unit-Information Prior" BIC is calculated depending on whether the
product of the vector of estimated model parameters (\hat{\theta}
) and
the observed information matrix (FIM) exceeds the number of estimated model
parameters (Case 1) or not (Case 2), which is checked internally:
\textrm{SPBIC}_{\textrm{Case 1}} = -2LL + q(1 - \frac{q}{\hat{\theta}^{'} \textrm{FIM} \hat{\theta}}), \textrm{ or}
\textrm{SPBIC}_{\textrm{Case 2}} = -2LL + \hat{\theta}^{'} \textrm{FIM} \hat{\theta},
Note that this implementation of SPBIC is calculated on the assumption that priors for all estimated parameters are centered at zero, which is inappropriate for most SEMs (e.g., variances should not have priors centered at the lowest possible value; Bollen, 2014, p. 6).
Bollen et al. (2014, eq. 14) credit the HBIC to Haughton (1988):
\textrm{HBIC} = -2LL + q\log{\frac{N}{2 \pi}}.
Bollen et al. (2012, p. 305) proposed the information matrix (I
)-based BIC by
adding another term:
\textrm{IBIC} = -2LL + q\log{\frac{N}{2 \pi}} + \log{\det{\textrm{FIM}}},
or equivalently, using the inverse information (the asymptotic sampling covariance matrix of estimated parameters: ACOV):
\textrm{IBIC} = -2LL - q\log{2 \pi} - \log{\det{\textrm{ACOV}}}.
Stochastic information criterion (SIC; see Preacher, 2006, for details) is
similar to IBIC but does not include the q\log{2 \pi}
term that is
also in HBIC. SIC and IBIC both account for model complexity in a model's
functional form, not merely the number of free parameters. The SIC can be
computed as:
\textrm{SIC} = -2LL + q\log{N} + \log{\det{\textrm{FIM}}} = -2LL - \log{\det{\textrm{ACOV}}}.
Hannan–Quinn Information Criterion (HQC; Hannan & Quinn, 1979) is used for model selection, similar to AIC or BIC.
\textrm{HQC} = -2LL + 2q\log{(\log{N})},
Bozdogan Information Complexity (ICOMP) Criteria (Howe et al., 2011), instead of penalizing the number of free parameters directly, ICOMP penalizes the covariance complexity of the model.
\textrm{ICOMP} = -2LL + s \times log(\frac{\bar{\lambda_a}}{\bar{\lambda_g}})
Value
A numeric
lavaan.vector
including any of the
following requested via fit.measures=
-
gammaHat
: Gamma-Hat -
adjGammaHat
: Adjusted Gamma-Hat -
baseline.rmsea
: RMSEA of the default baseline (i.e., independence) model -
gammaHat.scaled
: Gamma-Hat using scaled\chi^2
-
adjGammaHat.scaled
: Adjusted Gamma-Hat using scaled\chi^2
-
baseline.rmsea.scaled
: RMSEA of the default baseline (i.e., independence) model using scaled\chi^2
-
aic.smallN
: Corrected (for small sample size) AIC -
bic.priorN
: BIC with specified prior sample size -
spbic
: Scaled Unit-Information Prior BIC (SPBIC) -
hbic
: Haughton's BIC (HBIC) -
ibic
: Information-matrix-based BIC (IBIC) -
sic
: Stochastic Information Criterion (SIC) -
hqc
: Hannan-Quinn Information Criterion (HQC) -
icomp
: Bozdogan Information Complexity (ICOMP) Criteria
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Aaron Boulton (University of Delaware)
Ruben Arslan (Humboldt-University of Berlin, rubenarslan@gmail.com)
Yves Rosseel (Ghent University; Yves.Rosseel@UGent.be)
Mauricio Garnier-Villarreal (Vrije Universiteit Amsterdam; mgv@pm.me)
A great deal of feedback was provided by Kris Preacher regarding Bollen et al.'s (2012, 2014) extensions of BIC.
References
Bollen, K. A., Ray, S., Zavisca, J., & Harden, J. J. (2012). A comparison of Bayes factor approximation methods including two new methods. Sociological Methods & Research, 41(2), 294–324. doi:10.1177/0049124112452393
Bollen, K. A., Harden, J. J., Ray, S., & Zavisca, J. (2014). BIC and alternative Bayesian information criteria in the selection of structural equation models. Structural Equation Modeling, 21(1), 1–19. doi:10.1080/10705511.2014.856691
Burnham, K., & Anderson, D. (2003). Model selection and multimodel inference: A practical–theoretic approach. New York, NY: Springer–Verlag.
Dudgeon, P. (2004). A note on extending Steiger's (1998) multiple sample RMSEA adjustment to other noncentrality parameter-based statistic. Structural Equation Modeling, 11(3), 305–319. doi:10.1207/s15328007sem1103_1
Howe, E. D., Bozdogan, H., & Katragadda, S. (2011). Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion. Istanbul University Journal of the School of Business Administration, 40(1), 86–123.
Kuha, J. (2004). AIC and BIC: Comparisons of assumptions and performance. Sociological Methods Research, 33(2), 188–229. doi:10.1177/0049124103262065
Preacher, K. J. (2006). Quantifying parsimony in structural equation modeling. Multivariate Behavioral Research, 43(3), 227–259. doi:10.1207/s15327906mbr4103_1
West, S. G., Taylor, A. B., & Wu, W. (2012). Model fit and model selection in structural equation modeling. In R. H. Hoyle (Ed.), Handbook of structural equation modeling (pp. 209–231). New York, NY: Guilford.
See Also
-
miPowerFit()
For the modification indices and their power approach for model fit evaluation -
nullRMSEA()
For RMSEA of the default independence model
Examples
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit <- cfa(HS.model, data = HolzingerSwineford1939)
moreFitIndices(fit)
fit2 <- cfa(HS.model, data = HolzingerSwineford1939, estimator = "mlr")
moreFitIndices(fit2)
Generate Non-normal Data using Vale and Maurelli (1983) method
Description
Generate Non-normal Data using Vale and Maurelli (1983) method. The function
is designed to be as similar as the popular mvrnorm
function in the
MASS
package. The codes are copied from mvrnorm
function in
the MASS
package for argument checking and lavaan
package for
data generation using Vale and Maurelli (1983) method.
Usage
mvrnonnorm(n, mu, Sigma, skewness = NULL, kurtosis = NULL,
empirical = FALSE)
Arguments
n |
Sample size |
mu |
A mean vector. If elements are named, those will be used as variable names in the returned data matrix. |
Sigma |
A positive-definite symmetric matrix specifying the covariance
matrix of the variables. If rows or columns are named (and |
skewness |
A vector of skewness of the variables |
kurtosis |
A vector of excessive kurtosis of the variables |
empirical |
deprecated, ignored. |
Value
A data matrix
Author(s)
The original function is the lavaan::simulateData()
function written by Yves Rosseel in the lavaan
package. The function
is adjusted for a convenient usage by Sunthud Pornprasertmanit
(psunthud@gmail.com). Terrence D. Jorgensen added the feature to
retain variable names from mu
or Sigma
.
References
Vale, C. D. & Maurelli, V. A. (1983). Simulating multivariate nonormal distributions. Psychometrika, 48(3), 465–471. doi:10.1007/BF02293687
Examples
set.seed(123)
mvrnonnorm(20, c(1, 2), matrix(c(10, 2, 2, 5), 2, 2),
skewness = c(5, 2), kurtosis = c(3, 3))
## again, with variable names specified in mu
set.seed(123)
mvrnonnorm(20, c(a = 1, b = 2), matrix(c(10, 2, 2, 5), 2, 2),
skewness = c(5, 2), kurtosis = c(3, 3))
Nesting and Equivalence Testing
Description
This test examines whether pairs of SEMs are nested or equivalent.
Usage
net(..., crit = 1e-04)
Arguments
... |
The |
crit |
The upper-bound criterion for testing the equivalence of models.
Models are considered nested (or equivalent) if the difference between
their |
Details
The concept of nesting/equivalence should be the same regardless of
estimation method. However, the particular method of testing
nesting/equivalence (as described in Bentler & Satorra, 2010) employed by
the net
function analyzes summary statistics (model-implied means and
covariance matrices, not raw data). In the case of robust methods like MLR,
the raw data is only utilized for the robust adjustment to SE and chi-sq,
and the net function only checks the unadjusted chi-sq for the purposes of
testing nesting/equivalence. This method also applies to models for
categorical data, following the procedure described by Asparouhov & Muthen
(2019).
Value
The Net object representing the outputs for nesting and equivalent testing, including a logical matrix of test results and a vector of degrees of freedom for each model.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Bentler, P. M., & Satorra, A. (2010). Testing model nesting and equivalence. Psychological Methods, 15(2), 111–123. doi:10.1037/a0019625
Asparouhov, T., & Muthen, B. (2019). Nesting and equivalence testing for structural equation models. Structural Equation Modeling, 26(2), 302–309. doi:10.1080/10705511.2018.1513795
Examples
m1 <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
m2 <- ' f1 =~ x1 + x2 + x3 + x4
f2 =~ x5 + x6 + x7 + x8 + x9 '
m3 <- ' visual =~ x1 + x2 + x3
textual =~ eq*x4 + eq*x5 + eq*x6
speed =~ x7 + x8 + x9 '
fit1 <- cfa(m1, data = HolzingerSwineford1939)
fit1a <- cfa(m1, data = HolzingerSwineford1939, std.lv = TRUE) # Equivalent to fit1
fit2 <- cfa(m2, data = HolzingerSwineford1939) # Not equivalent to or nested in fit1
fit3 <- cfa(m3, data = HolzingerSwineford1939) # Nested in fit1 and fit1a
tests <- net(fit1, fit1a, fit2, fit3)
tests
summary(tests)
Calculate the RMSEA of the null model
Description
Calculate the RMSEA of the null (baseline) model
Usage
nullRMSEA(object, scaled = FALSE, silent = FALSE)
Arguments
object |
The lavaan model object provided after running the |
scaled |
If |
silent |
If |
Details
RMSEA of the null model is calculated similar to the formula provided in the
lavaan
package. The standard formula of RMSEA is
RMSEA =\sqrt{\frac{\chi^2}{N \times df} - \frac{1}{N}} \times
\sqrt{G}
where \chi^2
is the chi-square test statistic value of the target
model, N
is the total sample size, df
is the degree of freedom
of the hypothesized model, G
is the number of groups. Kenny proposed
in his website that
"A reasonable rule of thumb is to examine the RMSEA for the null model and make sure that is no smaller than 0.158. An RMSEA for the model of 0.05 and a TLI of .90, implies that the RMSEA of the null model is 0.158. If the RMSEA for the null model is less than 0.158, an incremental measure of fit may not be that informative."
See also http://davidakenny.net/cm/fit.htm
Value
A value of RMSEA of the null model (a numeric
vector)
returned invisibly.
Author(s)
Ruben Arslan (Humboldt-University of Berlin, rubenarslan@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Kenny, D. A., Kaniskan, B., & McCoach, D. B. (2015). The performance of RMSEA in models with small degrees of freedom. Sociological Methods Research, 44(3), 486–507. doi:10.1177/0049124114543236
See Also
-
miPowerFit()
For the modification indices and their power approach for model fit evaluation -
moreFitIndices()
For other fit indices
Examples
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit <- cfa(HS.model, data = HolzingerSwineford1939)
nullRMSEA(fit)
Random Allocation of Items to Parcels in a Structural Equation Model
Description
This function generates a given number of randomly generated item-to-parcel allocations, fits a model to each allocation, and provides averaged results over all allocations.
Usage
parcelAllocation(model, data, parcel.names, item.syntax, nAlloc = 100,
fun = "sem", alpha = 0.05, fit.measures = c("chisq", "df", "cfi",
"tli", "rmsea", "srmr"), ..., show.progress = FALSE, iseed = 12345,
do.fit = TRUE, return.fit = FALSE, warn = FALSE)
Arguments
model |
|
data |
A |
parcel.names |
|
item.syntax |
|
nAlloc |
The number of random items-to-parcels allocations to generate. |
fun |
|
alpha |
Alpha level used as criterion for significance. |
fit.measures |
|
... |
Additional arguments to be passed to
|
show.progress |
If |
iseed |
(Optional) Random seed used for parceling items. When the same
random seed is specified and the program is re-run, the same allocations
will be generated. Using the same |
do.fit |
If |
return.fit |
If |
warn |
Whether to print warnings when fitting |
Details
This function implements the random item-to-parcel allocation procedure
described in Sterba (2011) and Sterba and MacCallum (2010). The function
takes a single data set with item-level data, randomly assigns items to
parcels, fits a structural equation model to the parceled data using
lavaan::lavaanList()
, and repeats this process for a user-specified
number of random allocations. Results from all fitted models are summarized
in the output. For further details on the benefits of randomly allocating
items to parcels, see Sterba (2011) and Sterba and MacCallum (2010).
Value
Estimates |
A |
SE |
A |
Fit |
A |
Model |
A lavaan::lavaanList object containing results
of the |
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Sterba, S. K. (2011). Implications of parcel-allocation variability for comparing fit of item-solutions and parcel-solutions. Structural Equation Modeling, 18(4), 554–577. doi:10.1080/10705511.2011.607073
Sterba, S. K. & MacCallum, R. C. (2010). Variability in parameter estimates and model fit across random allocations of items to parcels. Multivariate Behavioral Research, 45(2), 322–358. doi:10.1080/00273171003680302
Sterba, S. K., & Rights, J. D. (2016). Accounting for parcel-allocation variability in practice: Combining sources of uncertainty and choosing the number of allocations. Multivariate Behavioral Research, 51(2–3), 296–313. doi:10.1080/00273171.2016.1144502
Sterba, S. K., & Rights, J. D. (2017). Effects of parceling on model selection: Parcel-allocation variability in model ranking. Psychological Methods, 22(1), 47–68. doi:10.1037/met0000067
See Also
PAVranking()
for comparing 2 models,
poolMAlloc()
for choosing the number of allocations
Examples
## Fit 2-factor CFA to simulated data. Each factor has 9 indicators.
## Specify the item-level model (if NO parcels were created)
item.syntax <- c(paste0("f1 =~ f1item", 1:9),
paste0("f2 =~ f2item", 1:9))
cat(item.syntax, sep = "\n")
## Below, we reduce the size of this same model by
## applying different parceling schemes
## 3-indicator parcels
mod.parcels <- '
f1 =~ par1 + par2 + par3
f2 =~ par4 + par5 + par6
'
## names of parcels
(parcel.names <- paste0("par", 1:6))
## override default random-number generator to use parallel options
RNGkind("L'Ecuyer-CMRG")
parcelAllocation(mod.parcels, data = simParcel, nAlloc = 100,
parcel.names = parcel.names, item.syntax = item.syntax,
# parallel = "multicore", # parallel available in Mac/Linux
std.lv = TRUE) # any addition lavaan arguments
## POOL RESULTS by treating parcel allocations as multiple imputations
## Details provided in Sterba & Rights (2016); see ?poolMAlloc.
## save list of data sets instead of fitting model yet
dataList <- parcelAllocation(mod.parcels, data = simParcel, nAlloc = 100,
parcel.names = parcel.names,
item.syntax = item.syntax,
do.fit = FALSE)
## now fit the model to each data set
library(lavaan.mi)
fit.parcels <- cfa.mi(mod.parcels, data = dataList, std.lv = TRUE)
summary(fit.parcels) # pooled using Rubin's rules
anova(fit.parcels) # pooled test statistic
help(package = "lavaan.mi") # find more methods for pooling results
## multigroup example
simParcel$group <- 0:1 # arbitrary groups for example
mod.mg <- '
f1 =~ par1 + c(L2, L2)*par2 + par3
f2 =~ par4 + par5 + par6
'
## names of parcels
(parcel.names <- paste0("par", 1:6))
parcelAllocation(mod.mg, data = simParcel, parcel.names, item.syntax,
std.lv = TRUE, group = "group", group.equal = "loadings",
nAlloc = 20, show.progress = TRUE)
## parcels for first factor, items for second factor
mod.items <- '
f1 =~ par1 + par2 + par3
f2 =~ f2item2 + f2item7 + f2item8
'
## names of parcels
(parcel.names <- paste0("par", 1:3))
parcelAllocation(mod.items, data = simParcel, parcel.names, item.syntax,
nAlloc = 20, std.lv = TRUE)
## mixture of 1- and 3-indicator parcels for second factor
mod.mix <- '
f1 =~ par1 + par2 + par3
f2 =~ f2item2 + f2item7 + f2item8 + par4 + par5 + par6
'
## names of parcels
(parcel.names <- paste0("par", 1:6))
parcelAllocation(mod.mix, data = simParcel, parcel.names, item.syntax,
nAlloc = 20, std.lv = TRUE)
Partial Measurement Invariance Testing Across Groups
Description
This test will provide partial invariance testing by (a) freeing a parameter
one-by-one from nested model and compare with the original nested model or
(b) fixing (or constraining) a parameter one-by-one from the parent model
and compare with the original parent model. This function only works with
congeneric models. The partialInvariance
is used for continuous
variable. The partialInvarianceCat
is used for categorical variables.
Usage
partialInvariance(fit, type, free = NULL, fix = NULL, refgroup = 1,
poolvar = TRUE, p.adjust = "none", fbound = 2, return.fit = FALSE,
method = "satorra.bentler.2001")
partialInvarianceCat(fit, type, free = NULL, fix = NULL, refgroup = 1,
poolvar = TRUE, p.adjust = "none", return.fit = FALSE,
method = "satorra.bentler.2001")
Arguments
fit |
A list of models for invariance testing. Each model should be
assigned by appropriate names (see details). The result from
|
type |
The types of invariance testing: "metric", "scalar", "strict", or "means" |
free |
A vector of variable names that are free across groups in advance. If partial mean invariance is tested, this argument represents a vector of factor names that are free across groups. |
fix |
A vector of variable names that are constrained to be equal across groups in advance. If partial mean invariance is tested, this argument represents a vector of factor names that are fixed across groups. |
refgroup |
The reference group used to make the effect size comparison with the other groups. |
poolvar |
If |
p.adjust |
The method used to adjust p values. See
|
fbound |
The z-scores of factor that is used to calculate the effect size of the loading difference proposed by Millsap and Olivera-Aguilar (2012). |
return.fit |
Return the submodels fitted by this function |
method |
The method used to calculate likelihood ratio test. See
|
Details
There are four types of partial invariance testing:
Partial weak invariance. The model named 'fit.configural' from the list of models is compared with the model named 'fit.loadings'. Each loading will be freed or fixed from the metric and configural invariance models respectively. The modified models are compared with the original model. Note that the objects in the list of models must have the names of "fit.configural" and "fit.loadings". Users may use "metric", "weak", "loading", or "loadings" in the
type
argument. Note that, for testing invariance on marker variables, other variables will be assigned as marker variables automatically.Partial strong invariance. The model named 'fit.loadings' from the list of models is compared with the model named either 'fit.intercepts' or 'fit.thresholds'. Each intercept will be freed or fixed from the scalar and metric invariance models respectively. The modified models are compared with the original model. Note that the objects in the list of models must have the names of "fit.loadings" and either "fit.intercepts" or "fit.thresholds". Users may use "scalar", "strong", "intercept", "intercepts", "threshold", or "thresholds" in the
type
argument. Note that, for testing invariance on marker variables, other variables will be assigned as marker variables automatically. Note that if all variables are dichotomous, scalar invariance testing is not available.Partial strict invariance. The model named either 'fit.intercepts' or 'fit.thresholds' (or 'fit.loadings') from the list of models is compared with the model named 'fit.residuals'. Each residual variance will be freed or fixed from the strict and scalar (or metric) invariance models respectively. The modified models are compared with the original model. Note that the objects in the list of models must have the names of "fit.residuals" and either "fit.intercepts", "fit.thresholds", or "fit.loadings". Users may use "strict", "residual", "residuals", "error", or "errors" in the
type
argument.Partial mean invariance. The model named either 'fit.intercepts' or 'fit.thresholds' (or 'fit.residuals' or 'fit.loadings') from the list of models is compared with the model named 'fit.means'. Each factor mean will be freed or fixed from the means and scalar (or strict or metric) invariance models respectively. The modified models are compared with the original model. Note that the objects in the list of models must have the names of "fit.means" and either "fit.residuals", "fit.intercepts", "fit.thresholds", or "fit.loadings". Users may use "means" or "mean" in the
type
argument.
Two types of comparisons are used in this function:
-
free
: The nested model is used as a template. Then, one parameter indicating the differences between two models is free. The new model is compared with the nested model. This process is repeated for all differences between two models. The likelihood-ratio test and the difference in CFI are provided. -
fix
: The parent model is used as a template. Then, one parameter indicating the differences between two models is fixed or constrained to be equal to other parameters. The new model is then compared with the parent model. This process is repeated for all differences between two models. The likelihood-ratio test and the difference in CFI are provided. -
wald
: This method is similar to thefix
method. However, instead of building a new model and compare them with likelihood-ratio test, multivariate wald test is used to compare equality between parameter estimates. Seelavaan::lavTestWald()
for further details. Note that if any rows of the contrast cannot be summed to 0, the Wald test is not provided, such as comparing two means where one of the means is fixed as 0. This test statistic is not as accurate as likelihood-ratio test provided infix
. I provide it here in case that likelihood-ratio test fails to converge.
Note that this function does not adjust for the inflated Type I error rate from multiple tests. The degree of freedom of all tests would be the number of groups minus 1.
The details of standardized estimates and the effect size used for each
parameters are provided in the vignettes by running
vignette("partialInvariance")
.
Value
A list of results are provided. The list will consists of at least two elements:
-
estimates
: The results of parameter estimates including pooled estimates (poolest
), the estimates for each group, standardized estimates for each group (std
), the difference in standardized values, and the effect size statistic (q for factor loading difference and h for error variance difference). See the details of this effect size statistic by runningvignette("partialInvariance")
. In thepartialInvariance
function, the additional effect statistics proposed by Millsap and Olivera-Aguilar (2012) are provided. For factor loading, the additional outputs are the observed mean difference (diff_mean
), the mean difference if factor scores are low (low_fscore
), and the mean difference if factor scores are high (high_fscore
). The low factor score is calculated by (a) finding the factor scores that its z score equals -bound
(the default is-2
) from all groups and (b) picking the minimum value among the factor scores. The high factor score is calculated by (a) finding the factor scores that its z score equalsbound
(default = 2) from all groups and (b) picking the maximum value among the factor scores. For measurement intercepts, the additional outputs are the observed means difference (diff_mean
) and the proportion of the differences in the intercepts over the observed means differences (propdiff
). For error variances, the additional outputs are the proportion of the difference in error variances over the difference in observed variances (propdiff
). -
results
: Statistical tests as well as the change in CFI are provided.\chi^2
and p value are provided for all methods. -
models
: The submodels used in thefree
andfix
methods, as well as the nested and parent models. The nested and parent models will be changed from the original models iffree
orfit
arguments are specified.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Millsap, R. E., & Olivera-Aguilar, M. (2012). Investigating measurement invariance using confirmatory factor analysis. In R. H. Hoyle (Ed.), Handbook of structural equation modeling (pp. 380–392). New York, NY: Guilford.
See Also
measurementInvariance()
for measurement invariance for
continuous variables; measurementInvarianceCat()
for measurement
invariance for categorical variables; lavaan::lavTestWald()
for
multivariate Wald test
Examples
## Conduct weak invariance testing manually by using fixed-factor
## method of scale identification
library(lavaan)
conf <- "
f1 =~ NA*x1 + x2 + x3
f2 =~ NA*x4 + x5 + x6
f1 ~~ c(1, 1)*f1
f2 ~~ c(1, 1)*f2
"
weak <- "
f1 =~ NA*x1 + x2 + x3
f2 =~ NA*x4 + x5 + x6
f1 ~~ c(1, NA)*f1
f2 ~~ c(1, NA)*f2
"
configural <- cfa(conf, data = HolzingerSwineford1939, std.lv = TRUE, group="school")
weak <- cfa(weak, data = HolzingerSwineford1939, group="school", group.equal="loadings")
models <- list(fit.configural = configural, fit.loadings = weak)
partialInvariance(models, "metric")
partialInvariance(models, "metric", free = "x5") # "x5" is free across groups in advance
partialInvariance(models, "metric", fix = "x4") # "x4" is fixed across groups in advance
## Use the result from the measurementInvariance function
HW.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
models2 <- measurementInvariance(model = HW.model, data=HolzingerSwineford1939,
group="school")
partialInvariance(models2, "scalar")
## Conduct weak invariance testing manually by using fixed-factor
## method of scale identification for dichotomous variables
f <- rnorm(1000, 0, 1)
u1 <- 0.9*f + rnorm(1000, 1, sqrt(0.19))
u2 <- 0.8*f + rnorm(1000, 1, sqrt(0.36))
u3 <- 0.6*f + rnorm(1000, 1, sqrt(0.64))
u4 <- 0.7*f + rnorm(1000, 1, sqrt(0.51))
u1 <- as.numeric(cut(u1, breaks = c(-Inf, 0, Inf)))
u2 <- as.numeric(cut(u2, breaks = c(-Inf, 0.5, Inf)))
u3 <- as.numeric(cut(u3, breaks = c(-Inf, 0, Inf)))
u4 <- as.numeric(cut(u4, breaks = c(-Inf, -0.5, Inf)))
g <- rep(c(1, 2), 500)
dat2 <- data.frame(u1, u2, u3, u4, g)
configural2 <- "
f1 =~ NA*u1 + u2 + u3 + u4
u1 | c(t11, t11)*t1
u2 | c(t21, t21)*t1
u3 | c(t31, t31)*t1
u4 | c(t41, t41)*t1
f1 ~~ c(1, 1)*f1
f1 ~ c(0, NA)*1
u1 ~~ c(1, 1)*u1
u2 ~~ c(1, NA)*u2
u3 ~~ c(1, NA)*u3
u4 ~~ c(1, NA)*u4
"
outConfigural2 <- cfa(configural2, data = dat2, group = "g",
parameterization = "theta", estimator = "wlsmv",
ordered = c("u1", "u2", "u3", "u4"))
weak2 <- "
f1 =~ NA*u1 + c(f11, f11)*u1 + c(f21, f21)*u2 + c(f31, f31)*u3 + c(f41, f41)*u4
u1 | c(t11, t11)*t1
u2 | c(t21, t21)*t1
u3 | c(t31, t31)*t1
u4 | c(t41, t41)*t1
f1 ~~ c(1, NA)*f1
f1 ~ c(0, NA)*1
u1 ~~ c(1, 1)*u1
u2 ~~ c(1, NA)*u2
u3 ~~ c(1, NA)*u3
u4 ~~ c(1, NA)*u4
"
outWeak2 <- cfa(weak2, data = dat2, group = "g", parameterization = "theta",
estimator = "wlsmv", ordered = c("u1", "u2", "u3", "u4"))
modelsCat <- list(fit.configural = outConfigural2, fit.loadings = outWeak2)
partialInvarianceCat(modelsCat, type = "metric")
partialInvarianceCat(modelsCat, type = "metric", free = "u2")
partialInvarianceCat(modelsCat, type = "metric", fix = "u3")
## Use the result from the measurementInvarianceCat function
model <- ' f1 =~ u1 + u2 + u3 + u4
f2 =~ u5 + u6 + u7 + u8'
modelsCat2 <- measurementInvarianceCat(model = model, data = datCat, group = "g",
parameterization = "theta",
estimator = "wlsmv", strict = TRUE)
partialInvarianceCat(modelsCat2, type = "scalar")
Permutation Randomization Tests of Measurement Equivalence and Differential Item Functioning (DIF)
Description
The function permuteMeasEq
provides tests of hypotheses involving
measurement equivalence, in one of two frameworks: multigroup CFA or MIMIC
models.
Usage
permuteMeasEq(nPermute, modelType = c("mgcfa", "mimic"), con, uncon = NULL,
null = NULL, param = NULL, freeParam = NULL, covariates = NULL,
AFIs = NULL, moreAFIs = NULL, maxSparse = 10, maxNonconv = 10,
showProgress = TRUE, warn = -1, datafun, extra,
parallelType = c("none", "multicore", "snow"), ncpus = NULL, cl = NULL,
iseed = 12345)
Arguments
nPermute |
An integer indicating the number of random permutations used to form empirical distributions under the null hypothesis. |
modelType |
A character string indicating type of model employed:
multiple-group CFA ( |
con |
The constrained |
uncon |
Optional. The unconstrained |
null |
Optional. A |
param |
An optional character vector or list of character vectors
indicating which parameters the user would test for DIF following a
rejection of the omnibus null hypothesis tested using
( |
freeParam |
An optional character vector, silently ignored when
|
covariates |
An optional character vector, only applicable when
|
AFIs |
A character vector indicating which alternative fit indices (or
chi-squared itself) are to be used to test the multiparameter omnibus null
hypothesis that the constraints specified in |
moreAFIs |
Optional. A character vector indicating which (if any)
alternative fit indices returned by |
maxSparse |
Only applicable when |
maxNonconv |
An integer indicating the maximum number of consecutive
times that a random permutation can yield a sample for which the model does
not converge on a solution. If such a sample occurs, permutation is
attempted repeatedly until a sample is obtained for which the model does
converge. If |
showProgress |
Logical. Indicating whether to display a progress bar
while permuting. Silently set to |
warn |
Sets the handling of warning messages when fitting model(s) to
permuted data sets. See |
datafun |
An optional function that can be applied to the data
(extracted from |
extra |
An optional function that can be applied to any (or all) of the
fitted lavaan objects ( |
parallelType |
The type of parallel operation to be used (if any). The
default is |
ncpus |
Integer: number of processes to be used in parallel operation.
If |
cl |
An optional parallel or snow cluster for use when
|
iseed |
Integer: Only used to set the states of the RNG when using
parallel options, in which case |
Details
The function permuteMeasEq
provides tests of hypotheses involving
measurement equivalence, in one of two frameworks:
1 For multiple-group CFA models, provide a pair of nested lavaan objects, the less constrained of which (
uncon
) freely estimates a set of measurement parameters (e.g., factor loadings, intercepts, or thresholds; specified inparam
) in all groups, and the more constrained of which (con
) constrains those measurement parameters to equality across groups. Group assignment is repeatedly permuted and the models are fit to each permutation, in order to produce an empirical distribution under the null hypothesis of no group differences, both for (a) changes in user-specified fit measures (seeAFIs
andmoreAFIs
) and for (b) the maximum modification index among the user-specified equality constraints. Configural invariance can also be tested by providing that fitted lavaan object tocon
and leavinguncon = NULL
, in which caseparam
must beNULL
as well.2 In MIMIC models, one or a set of continuous and/or discrete
covariates
can be permuted, and a constrained model is fit to each permutation in order to provide a distribution of any fit measures (namely, the maximum modification index among fixed parameters inparam
) under the null hypothesis of measurement equivalence across levels of those covariates.
In either framework, modification indices for equality constraints or fixed
parameters specified in param
are calculated from the constrained
model (con
) using the function lavaan::lavTestScore()
.
For multiple-group CFA models, the multiparameter omnibus null hypothesis of
measurement equivalence/invariance is that there are no group differences in
any measurement parameters (of a particular type). This can be tested using
the anova
method on nested lavaan
objects, as seen in the
output of measurementInvariance()
, or by inspecting
the change in alternative fit indices (AFIs) such as the CFI. The
permutation randomization method employed by permuteMeasEq
generates
an empirical distribution of any AFIs
under the null hypothesis, so
the user is not restricted to using fixed cutoffs proposed by Cheung &
Rensvold (2002), Chen (2007), or Meade, Johnson, & Braddy (2008).
If the multiparameter omnibus null hypothesis is rejected, partial
invariance can still be established by freeing invalid equality constraints,
as long as equality constraints are valid for at least two indicators per
factor. Modification indices can be calculated from the constrained model
(con
), but multiple testing leads to inflation of Type I error rates.
The permutation randomization method employed by permuteMeasEq
creates a distribution of the maximum modification index if the null
hypothesis is true, which allows the user to control the familywise Type I
error rate in a manner similar to Tukey's q (studentized range)
distribution for the Honestly Significant Difference (HSD) post hoc test.
For MIMIC models, DIF can be tested by comparing modification indices of
regression paths to the permutation distribution of the maximum modification
index, which controls the familywise Type I error rate. The MIMIC approach
could also be applied with multiple-group models, but the grouping variable
would not be permuted; rather, the covariates would be permuted separately
within each group to preserve between-group differences. So whether
parameters are constrained or unconstrained across groups, the MIMIC
approach is only for testing null hypotheses about the effects of
covariates
on indicators, controlling for common factors.
In either framework, lavaan::lavaan()
's group.label
argument is used to preserve the order of groups seen in con
when
permuting the data.
Value
The permuteMeasEq object representing the results of
testing measurement equivalence (the multiparameter omnibus test) and DIF
(modification indices), as well as diagnostics and any extra
output.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Papers about permutation tests of measurement equivalence:
Jorgensen, T. D., Kite, B. A., Chen, P.-Y., & Short, S. D. (2018). Permutation randomization methods for testing measurement equivalence and detecting differential item functioning in multiple-group confirmatory factor analysis. Psychological Methods, 23(4), 708–728. doi:10.1037/met0000152
Kite, B. A., Jorgensen, T. D., & Chen, P.-Y. (2018). Random permutation testing applied to measurement invariance testing with ordered-categorical indicators. Structural Equation Modeling 25(4), 573–587. doi:10.1080/10705511.2017.1421467
Jorgensen, T. D. (2017). Applying permutation tests and multivariate modification indices to configurally invariant models that need respecification. Frontiers in Psychology, 8(1455). doi:10.3389/fpsyg.2017.01455
Additional reading:
Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14(3), 464–504. doi:10.1080/10705510701301834
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233–255. doi:10.1207/S15328007SEM0902_5
Meade, A. W., Johnson, E. C., & Braddy, P. W. (2008). Power and sensitivity of alternative fit indices in tests of measurement invariance. Journal of Applied Psychology, 93(3), 568–592. doi:10.1037/0021-9010.93.3.568
Widamin, K. F., & Thompson, J. S. (2003). On specifying the null model for incremental fit indices in structural equation modeling. Psychological Methods, 8(1), 16–37. doi:10.1037/1082-989X.8.1.16
See Also
stats::TukeyHSD()
, lavaan::lavTestScore()
,
measurementInvariance()
,
measurementInvarianceCat()
Examples
########################
## Multiple-Group CFA ##
########################
## create 3-group data in lavaan example(cfa) data
HS <- lavaan::HolzingerSwineford1939
HS$ageGroup <- ifelse(HS$ageyr < 13, "preteen",
ifelse(HS$ageyr > 13, "teen", "thirteen"))
## specify and fit an appropriate null model for incremental fit indices
mod.null <- c(paste0("x", 1:9, " ~ c(T", 1:9, ", T", 1:9, ", T", 1:9, ")*1"),
paste0("x", 1:9, " ~~ c(L", 1:9, ", L", 1:9, ", L", 1:9, ")*x", 1:9))
fit.null <- cfa(mod.null, data = HS, group = "ageGroup")
## fit target model with varying levels of measurement equivalence
mod.config <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
fit.config <- cfa(mod.config, data = HS, std.lv = TRUE, group = "ageGroup")
fit.metric <- cfa(mod.config, data = HS, std.lv = TRUE, group = "ageGroup",
group.equal = "loadings")
fit.scalar <- cfa(mod.config, data = HS, std.lv = TRUE, group = "ageGroup",
group.equal = c("loadings","intercepts"))
####################### Permutation Method
## fit indices of interest for multiparameter omnibus test
myAFIs <- c("chisq","cfi","rmsea","mfi","aic")
moreAFIs <- c("gammaHat","adjGammaHat")
## Use only 20 permutations for a demo. In practice,
## use > 1000 to reduce sampling variability of estimated p values
## test configural invariance
set.seed(12345)
out.config <- permuteMeasEq(nPermute = 20, con = fit.config)
out.config
## test metric equivalence
set.seed(12345) # same permutations
out.metric <- permuteMeasEq(nPermute = 20, uncon = fit.config, con = fit.metric,
param = "loadings", AFIs = myAFIs,
moreAFIs = moreAFIs, null = fit.null)
summary(out.metric, nd = 4)
## test scalar equivalence
set.seed(12345) # same permutations
out.scalar <- permuteMeasEq(nPermute = 20, uncon = fit.metric, con = fit.scalar,
param = "intercepts", AFIs = myAFIs,
moreAFIs = moreAFIs, null = fit.null)
summary(out.scalar)
## Not much to see without significant DIF.
## Try using an absurdly high alpha level for illustration.
outsum <- summary(out.scalar, alpha = .50)
## notice that the returned object is the table of DIF tests
outsum
## visualize permutation distribution
hist(out.config, AFI = "chisq")
hist(out.metric, AFI = "chisq", nd = 2, alpha = .01,
legendArgs = list(x = "topright"))
hist(out.scalar, AFI = "cfi", printLegend = FALSE)
####################### Extra Output
## function to calculate expected change of Group-2 and -3 latent means if
## each intercept constraint were released
extra <- function(con) {
output <- list()
output["x1.vis2"] <- lavTestScore(con, release = 19:20, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[70]
output["x1.vis3"] <- lavTestScore(con, release = 19:20, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[106]
output["x2.vis2"] <- lavTestScore(con, release = 21:22, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[70]
output["x2.vis3"] <- lavTestScore(con, release = 21:22, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[106]
output["x3.vis2"] <- lavTestScore(con, release = 23:24, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[70]
output["x3.vis3"] <- lavTestScore(con, release = 23:24, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[106]
output["x4.txt2"] <- lavTestScore(con, release = 25:26, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[71]
output["x4.txt3"] <- lavTestScore(con, release = 25:26, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[107]
output["x5.txt2"] <- lavTestScore(con, release = 27:28, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[71]
output["x5.txt3"] <- lavTestScore(con, release = 27:28, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[107]
output["x6.txt2"] <- lavTestScore(con, release = 29:30, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[71]
output["x6.txt3"] <- lavTestScore(con, release = 29:30, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[107]
output["x7.spd2"] <- lavTestScore(con, release = 31:32, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[72]
output["x7.spd3"] <- lavTestScore(con, release = 31:32, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[108]
output["x8.spd2"] <- lavTestScore(con, release = 33:34, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[72]
output["x8.spd3"] <- lavTestScore(con, release = 33:34, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[108]
output["x9.spd2"] <- lavTestScore(con, release = 35:36, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[72]
output["x9.spd3"] <- lavTestScore(con, release = 35:36, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[108]
output
}
## observed EPC
extra(fit.scalar)
## permutation results, including extra output
set.seed(12345) # same permutations
out.scalar <- permuteMeasEq(nPermute = 20, uncon = fit.metric, con = fit.scalar,
param = "intercepts", AFIs = myAFIs,
moreAFIs = moreAFIs, null = fit.null, extra = extra)
## summarize extra output
summary(out.scalar, extra = TRUE)
###########
## MIMIC ##
###########
## Specify Restricted Factor Analysis (RFA) model, equivalent to MIMIC, but
## the factor covaries with the covariate instead of being regressed on it.
## The covariate defines a single-indicator construct, and the
## double-mean-centered products of the indicators define a latent
## interaction between the factor and the covariate.
mod.mimic <- '
visual =~ x1 + x2 + x3
age =~ ageyr
age.by.vis =~ x1.ageyr + x2.ageyr + x3.ageyr
x1 ~~ x1.ageyr
x2 ~~ x2.ageyr
x3 ~~ x3.ageyr
'
HS.orth <- indProd(var1 = paste0("x", 1:3), var2 = "ageyr", match = FALSE,
data = HS[ , c("ageyr", paste0("x", 1:3))] )
fit.mimic <- cfa(mod.mimic, data = HS.orth, meanstructure = TRUE)
summary(fit.mimic, stand = TRUE)
## Whereas MIMIC models specify direct effects of the covariate on an indicator,
## DIF can be tested in RFA models by specifying free loadings of an indicator
## on the covariate's construct (uniform DIF, scalar invariance) and the
## interaction construct (nonuniform DIF, metric invariance).
param <- as.list(paste0("age + age.by.vis =~ x", 1:3))
names(param) <- paste0("x", 1:3)
# param <- as.list(paste0("x", 1:3, " ~ age + age.by.vis")) # equivalent
## test both parameters simultaneously for each indicator
do.call(rbind, lapply(param, function(x) lavTestScore(fit.mimic, add = x)$test))
## or test each parameter individually
lavTestScore(fit.mimic, add = as.character(param))
####################### Permutation Method
## function to recalculate interaction terms after permuting the covariate
datafun <- function(data) {
d <- data[, c(paste0("x", 1:3), "ageyr")]
indProd(var1 = paste0("x", 1:3), var2 = "ageyr", match = FALSE, data = d)
}
set.seed(12345)
perm.mimic <- permuteMeasEq(nPermute = 20, modelType = "mimic",
con = fit.mimic, param = param,
covariates = "ageyr", datafun = datafun)
summary(perm.mimic)
Class for the Results of Permutation Randomization Tests of Measurement Equivalence and DIF
Description
This class contains the results of tests of Measurement Equivalence and Differential Item Functioning (DIF).
Usage
## S4 method for signature 'permuteMeasEq'
show(object)
## S4 method for signature 'permuteMeasEq'
summary(object, alpha = 0.05, nd = 3,
extra = FALSE)
## S4 method for signature 'permuteMeasEq'
hist(x, ..., AFI, alpha = 0.05, nd = 3,
printLegend = TRUE, legendArgs = list(x = "topleft"))
Arguments
object , x |
object of class |
alpha |
alpha level used to draw confidence limits in |
nd |
number of digits to display |
extra |
|
... |
Additional arguments to pass to |
AFI |
|
printLegend |
|
legendArgs |
|
Value
The
show
method prints a summary of the multiparameter omnibus test results, using the user-specified AFIs. The parametric (\Delta
)\chi^2
test is also displayed.The
summary
method prints the same information from theshow
method, but whenextra = FALSE
(the default) it also provides a table summarizing any requested follow-up tests of DIF using modification indices in slotMI.obs
. The user can also specify analpha
level for flagging modification indices as significant, as well asnd
(the number of digits displayed). For each modification index, the p value is displayed using a central\chi^2
distribution with the df shown in that column. Additionally, a p value is displayed using the permutation distribution of the maximum index, which controls the familywise Type I error rate in a manner similar to Tukey's studentized range test. If any indices are flagged as significant using thetukey.p.value
, then a message is displayed for each flagged index. The invisibly returneddata.frame
is the displayed table of modification indices, unlesspermuteMeasEq()
was called withparam = NULL
, in which case the invisibly returned object isobject
. Ifextra = TRUE
, the permutation-based p values for each statistic returned by theextra
function are displayed and returned in adata.frame
instead of the modification indices requested in theparam
argument.The
hist
method returns a list oflength == 2
, containing the arguments for the call tohist
and the arguments to the call forlegend
, respectively. This list may facilitate creating a customized histogram ofAFI.dist
,MI.dist
, orextra.dist
Slots
PT
A
data.frame
returned by a call tolavaan::parTable()
on the constrained modelmodelType
A character indicating the specified
modelType
in the call topermuteMeasEq
ANOVA
A
numeric
vector indicating the results of the observed (\Delta
)\chi^2
test, based on the central\chi^2
distributionAFI.obs
A vector of observed (changes in) user-selected fit measures
AFI.dist
The permutation distribution(s) of user-selected fit measures. A
data.frame
withn.Permutations
rows and one column for eachAFI.obs
.AFI.pval
A vector of p values (one for each element in slot
AFI.obs
) calculated using slotAFI.dist
, indicating the probability of observing a change at least as extreme asAFI.obs
if the null hypothesis were trueMI.obs
A
data.frame
of observed Lagrange Multipliers (modification indices) associated with the equality constraints or fixed parameters specified in theparam
argument. This is a subset of the output returned by a call tolavaan::lavTestScore()
on the constrained model.MI.dist
The permutation distribution of the maximum modification index (among those seen in slot
MI.obs$X2
) at each permutation of group assignment or ofcovariates
extra.obs
If
permuteMeasEq
was called with anextra
function, the output when applied to the original data is concatenated into this vectorextra.dist
A
data.frame
, each column of which contains the permutation distribution of the corresponding statistic in slotextra.obs
n.Permutations
An
integer
indicating the number of permutations requested by the usern.Converged
An
integer
indicating the number of permuation iterations which yielded a converged solutionn.nonConverged
An
integer
vector of lengthn.Permutations
indicating how many times group assignment was randomly permuted (at each iteration) before converging on a solutionn.Sparse
Only relevant with
ordered
indicators whenmodelType == "mgcfa"
. Aninteger
vector of lengthn.Permutations
indicating how many times group assignment was randomly permuted (at each iteration) before obtaining a sample with all categories observed in all groups.oldSeed
An
integer
vector storing the value of.Random.seed
before runningpermuteMeasEq
. Only relevant when using a parallel/multicore option and the originalRNGkind() != "L'Ecuyer-CMRG"
. This enables users to restore their previous.Random.seed
state, if desired, by running:.Random.seed[-1] <- permutedResults@oldSeed[-1]
Objects from the Class
Objects can be created via the
permuteMeasEq()
function.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
See Also
Examples
# See the example from the permuteMeasEq function
Plausible-Values Imputation of Factor Scores Estimated from a lavaan Model
Description
Draw plausible values of factor scores estimated from a fitted
lavaan::lavaan()
model, then treat them as multiple imputations
of missing data using lavaan.mi::lavaan.mi()
.
Usage
plausibleValues(object, nDraws = 20L, seed = 12345,
omit.imps = c("no.conv", "no.se"), ...)
Arguments
object |
A fitted model of class lavaan::lavaan, blavaan::blavaan, or lavaan.mi::lavaan.mi |
nDraws |
|
seed |
|
omit.imps |
|
... |
Optional arguments to pass to |
Details
Because latent variables are unobserved, they can be considered as missing
data, which can be imputed using Monte Carlo methods. This may be of
interest to researchers with sample sizes too small to fit their complex
structural models. Fitting a factor model as a first step,
lavaan::lavPredict()
provides factor-score estimates, which can
be treated as observed values in a path analysis (Step 2). However, the
resulting standard errors and test statistics could not be trusted because
the Step-2 analysis would not take into account the uncertainty about the
estimated factor scores. Using the asymptotic sampling covariance matrix
of the factor scores provided by lavaan::lavPredict()
,
plausibleValues
draws a set of nDraws
imputations from the
sampling distribution of each factor score, returning a list of data sets
that can be treated like multiple imputations of incomplete data. If the
data were already imputed to handle missing data, plausibleValues
also accepts an object of class lavaan.mi::lavaan.mi, and will
draw nDraws
plausible values from each imputation. Step 2 would
then take into account uncertainty about both missing values and factor
scores. Bayesian methods can also be used to generate factor scores, as
available with the blavaan package, in which case plausible
values are simply saved parameters from the posterior distribution. See
Asparouhov and Muthen (2010) for further technical details and references.
Each returned data.frame
includes a case.idx
column that
indicates the corresponding rows in the data set to which the model was
originally fitted (unless the user requests only Level-2 variables). This
can be used to merge the plausible values with the original observed data,
but users should note that including any new variables in a Step-2 model
might not accurately account for their relationship(s) with factor scores
because they were not accounted for in the Step-1 model from which factor
scores were estimated.
If object
is a multilevel lavaan
model, users can request
plausible values for latent variables at particular levels of analysis by
setting the lavaan::lavPredict()
argument level=1
or
level=2
. If the level
argument is not passed via ...,
then both levels are returned in a single merged data set per draw. For
multilevel models, each returned data.frame
also includes a column
indicating to which cluster each row belongs (unless the user requests only
Level-2 variables).
Value
A list
of length nDraws
, each of which is a
data.frame
containing plausible values, which can be treated as
a list
of imputed data sets to be passed to runMI()
(see Examples). If object
is of class
lavaan.mi::lavaan.mi, the list
will be of length
nDraws*m
, where m
is the number of imputations.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Asparouhov, T. & Muthen, B. O. (2010). Plausible values for latent variables using Mplus. Technical Report. Retrieved from www.statmodel.com/download/Plausible.pdf
See Also
lavaan.mi::lavaan.mi()
, lavaan.mi::lavaan.mi
Examples
## example from ?cfa and ?lavPredict help pages
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit1 <- cfa(HS.model, data = HolzingerSwineford1939)
fs1 <- plausibleValues(fit1, nDraws = 3,
## lavPredict() can add only the modeled data
append.data = TRUE)
lapply(fs1, head)
## To merge factor scores to original data.frame (not just modeled data)
fs1 <- plausibleValues(fit1, nDraws = 3)
idx <- lavInspect(fit1, "case.idx") # row index for each case
if (is.list(idx)) idx <- do.call(c, idx) # for multigroup models
data(HolzingerSwineford1939) # copy data to workspace
HolzingerSwineford1939$case.idx <- idx # add row index as variable
## loop over draws to merge original data with factor scores
for (i in seq_along(fs1)) {
fs1[[i]] <- merge(fs1[[i]], HolzingerSwineford1939, by = "case.idx")
}
lapply(fs1, head)
## multiple-group analysis, in 2 steps
step1 <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
group.equal = c("loadings","intercepts"))
PV.list <- plausibleValues(step1)
## subsequent path analysis
path.model <- ' visual ~ c(t1, t2)*textual + c(s1, s2)*speed '
if(requireNamespace("lavaan.mi")){
library(lavaan.mi)
step2 <- sem.mi(path.model, data = PV.list, group = "school")
## test equivalence of both slopes across groups
lavTestWald.mi(step2, constraints = 't1 == t2 ; s1 == s2')
}
## multilevel example from ?Demo.twolevel help page
model <- '
level: 1
fw =~ y1 + y2 + y3
fw ~ x1 + x2 + x3
level: 2
fb =~ y1 + y2 + y3
fb ~ w1 + w2
'
msem <- sem(model, data = Demo.twolevel, cluster = "cluster")
mlPVs <- plausibleValues(msem, nDraws = 3) # both levels by default
lapply(mlPVs, head, n = 10)
## only Level 1
mlPV1 <- plausibleValues(msem, nDraws = 3, level = 1)
lapply(mlPV1, head)
## only Level 2
mlPV2 <- plausibleValues(msem, nDraws = 3, level = 2)
lapply(mlPV2, head)
## example with 20 multiple imputations of missing data:
nPVs <- 5
nImps <- 20
if(requireNamespace("lavaan.mi")){
data(HS20imps, package = "lavaan.mi")
## specify CFA model from lavaan's ?cfa help page
HS.model <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
out2 <- cfa.mi(HS.model, data = HS20imps)
PVs <- plausibleValues(out2, nDraws = nPVs)
idx <- out2@Data@case.idx # can't use lavInspect() on lavaan.mi
## empty list to hold expanded imputations
impPVs <- list()
for (m in 1:nImps) {
HS20imps[[m]]["case.idx"] <- idx
for (i in 1:nPVs) {
impPVs[[ nPVs*(m - 1) + i ]] <- merge(HS20imps[[m]],
PVs[[ nPVs*(m - 1) + i ]],
by = "case.idx")
}
}
lapply(impPVs, head)
}
Plot a latent interaction
Description
This function will plot the line graphs representing the simple effect of the independent variable given the values of the moderator. For multigroup models, it will only generate a plot for 1 group, as specified in the function used to obtain the first argument.
Usage
plotProbe(object, xlim, xlab = "Indepedent Variable",
ylab = "Dependent Variable", legend = TRUE, legendArgs = list(), ...)
Arguments
object |
A |
xlim |
The vector of two numbers: the minimum and maximum values of the independent variable |
xlab |
The label of the x-axis |
ylab |
The label of the y-axis |
legend |
|
legendArgs |
|
... |
Any additional argument for the |
Value
None. This function will plot the simple main effect only.
Note
If the object
does not contain simple intercepts (i.e., if the
object$SimpleIntcept
element is NULL
), then all simple
intercepts are arbitrarily set to zero in order to plot the simple slopes.
This may not be consistent with the fitted model, but was (up until version
0.5-7) the default behavior when the y-intercept was fixed to 0. In this case,
although the relative steepness of simple slopes can still meaningfully be
compared, the relative vertical positions of lines at any point along the
x-axis should not be interpreted.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Schoemann, A. M., & Jorgensen, T. D. (2021). Testing and interpreting
latent variable interactions using the semTools
package.
Psych, 3(3), 322–335. doi:10.3390/psych3030024
See Also
-
indProd()
For creating the indicator products with no centering, mean centering, double-mean centering, or residual centering. -
probe2WayMC()
For probing the two-way latent interaction when the results are obtained from mean-centering, or double-mean centering -
probe3WayMC()
For probing the three-way latent interaction when the results are obtained from mean-centering, or double-mean centering -
probe2WayRC()
For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
probe3WayRC()
For probing the two-way latent interaction when the results are obtained from residual-centering approach.
Examples
library(lavaan)
dat2wayMC <- indProd(dat2way, 1:3, 4:6)
model1 <- "
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f12 =~ x1.x4 + x2.x5 + x3.x6
f3 =~ x7 + x8 + x9
f3 ~ f1 + f2 + f12
f12 ~~ 0*f1
f12 ~~ 0*f2
x1 ~ 0*1
x4 ~ 0*1
x1.x4 ~ 0*1
x7 ~ 0*1
f1 ~ NA*1
f2 ~ NA*1
f12 ~ NA*1
f3 ~ NA*1
"
fitMC2way <- sem(model1, data = dat2wayMC, meanstructure = TRUE)
result2wayMC <- probe2WayMC(fitMC2way, nameX = c("f1", "f2", "f12"),
nameY = "f3", modVar = "f2", valProbe = c(-1, 0, 1))
plotProbe(result2wayMC, xlim = c(-2, 2))
dat3wayMC <- indProd(dat3way, 1:3, 4:6, 7:9)
model3 <- "
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f3 =~ x7 + x8 + x9
f12 =~ x1.x4 + x2.x5 + x3.x6
f13 =~ x1.x7 + x2.x8 + x3.x9
f23 =~ x4.x7 + x5.x8 + x6.x9
f123 =~ x1.x4.x7 + x2.x5.x8 + x3.x6.x9
f4 =~ x10 + x11 + x12
f4 ~ f1 + f2 + f3 + f12 + f13 + f23 + f123
f1 ~~ 0*f12
f1 ~~ 0*f13
f1 ~~ 0*f123
f2 ~~ 0*f12
f2 ~~ 0*f23
f2 ~~ 0*f123
f3 ~~ 0*f13
f3 ~~ 0*f23
f3 ~~ 0*f123
f12 ~~ 0*f123
f13 ~~ 0*f123
f23 ~~ 0*f123
x1 ~ 0*1
x4 ~ 0*1
x7 ~ 0*1
x10 ~ 0*1
x1.x4 ~ 0*1
x1.x7 ~ 0*1
x4.x7 ~ 0*1
x1.x4.x7 ~ 0*1
f1 ~ NA*1
f2 ~ NA*1
f3 ~ NA*1
f12 ~ NA*1
f13 ~ NA*1
f23 ~ NA*1
f123 ~ NA*1
f4 ~ NA*1
"
fitMC3way <- sem(model3, data = dat3wayMC, std.lv = FALSE,
meanstructure = TRUE)
result3wayMC <- probe3WayMC(fitMC3way, nameX = c("f1", "f2", "f3", "f12",
"f13", "f23", "f123"),
nameY = "f4", modVar = c("f1", "f2"),
valProbe1 = c(-1, 0, 1), valProbe2 = c(-1, 0, 1))
plotProbe(result3wayMC, xlim = c(-2, 2))
Plot the sampling distributions of RMSEA
Description
Plots the sampling distributions of RMSEA based on the noncentral chi-square distributions
Usage
plotRMSEAdist(rmsea, n, df, ptile = NULL, caption = NULL,
rmseaScale = TRUE, group = 1)
Arguments
rmsea |
The vector of RMSEA values to be plotted |
n |
Sample size of a dataset |
df |
Model degrees of freedom |
ptile |
The percentile rank of the distribution of the first RMSEA that users wish to plot a vertical line in the resulting graph |
caption |
The name vector of each element of |
rmseaScale |
If |
group |
The number of group that is used to calculate RMSEA. |
Details
This function creates overlappling plots of the sampling distribution of
RMSEA based on noncentral \chi^2
distribution (MacCallum, Browne, &
Suguwara, 1996). First, the noncentrality parameter (\lambda
) is
calculated from RMSEA (Steiger, 1998; Dudgeon, 2004) by
\lambda = (N -
1)d\varepsilon^2 / K,
where N
is sample size, d
is the model
degree of freedom, K
is the number of group, and \varepsilon
is
the population RMSEA. Next, the noncentral \chi^2
distribution with a
specified df and noncentrality parameter is plotted. Thus,
the x-axis represents the sample \chi^2
value. The sample \chi^2
value can be transformed to the sample RMSEA scale (\hat{\varepsilon}
)
by
\hat{\varepsilon} = \sqrt{K}\sqrt{\frac{\chi^2 - d}{(N - 1)d}},
where \chi^2
is the \chi^2
value obtained from the noncentral
\chi^2
distribution.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Dudgeon, P. (2004). A note on extending Steiger's (1998) multiple sample RMSEA adjustment to other noncentrality parameter-based statistic. Structural Equation Modeling, 11(3), 305–319. doi:10.1207/s15328007sem1103_1
MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1(2), 130–149. doi:10.1037/1082-989X.1.2.130
Steiger, J. H. (1998). A note on multiple sample extensions of the RMSEA fit index. Structural Equation Modeling, 5(4), 411–419. doi:10.1080/10705519809540115
See Also
-
plotRMSEApower()
to plot the statistical power based on population RMSEA given the sample size -
findRMSEApower()
to find the statistical power based on population RMSEA given a sample size -
findRMSEAsamplesize()
to find the minium sample size for a given statistical power based on population RMSEA
Examples
plotRMSEAdist(c(.05, .08), n = 200, df = 20, ptile = .95, rmseaScale = TRUE)
plotRMSEAdist(c(.05, .01), n = 200, df = 20, ptile = .05, rmseaScale = FALSE)
Plot power curves for RMSEA
Description
Plots power of RMSEA over a range of sample sizes
Usage
plotRMSEApower(rmsea0, rmseaA, df, nlow, nhigh, steps = 1, alpha = 0.05,
group = 1, ...)
Arguments
rmsea0 |
Null RMSEA |
rmseaA |
Alternative RMSEA |
df |
Model degrees of freedom |
nlow |
Lower sample size |
nhigh |
Upper sample size |
steps |
Increase in sample size for each iteration. Smaller values of steps will lead to more precise plots. However, smaller step sizes means a longer run time. |
alpha |
Alpha level used in power calculations |
group |
The number of group that is used to calculate RMSEA. |
... |
The additional arguments for the plot function. |
Details
This function creates plot of power for RMSEA against a range of sample sizes. The plot places sample size on the horizontal axis and power on the vertical axis. The user should indicate the lower and upper values for sample size and the sample size between each estimate ("step size") We strongly urge the user to read the sources below (see References) before proceeding. A web version of this function is available at: http://quantpsy.org/rmsea/rmseaplot.htm. This function is also implemented in the web application "power4SEM": https://sjak.shinyapps.io/power4SEM/
Value
Plot of power for RMSEA against a range of sample sizes
Author(s)
Alexander M. Schoemann (East Carolina University; schoemanna@ecu.edu)
Kristopher J. Preacher (Vanderbilt University; kris.preacher@vanderbilt.edu)
Donna L. Coffman (Pennsylvania State University; dlc30@psu.edu)
References
MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between nested covariance structure models: Power analysis and null hypotheses. Psychological Methods, 11(1), 19–35. doi:10.1037/1082-989X.11.1.19
MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1(2), 130–149. doi:10.1037/1082-989X.1.2.130
MacCallum, R. C., Lee, T., & Browne, M. W. (2010). The issue of isopower in power analysis for tests of structural equation models. Structural Equation Modeling, 17(1), 23–41. doi:10.1080/10705510903438906
Preacher, K. J., Cai, L., & MacCallum, R. C. (2007). Alternatives to traditional model comparison strategies for covariance structure models. In T. D. Little, J. A. Bovaird, & N. A. Card (Eds.), Modeling contextual effects in longitudinal studies (pp. 33–62). Mahwah, NJ: Lawrence Erlbaum Associates.
Steiger, J. H. (1998). A note on multiple sample extensions of the RMSEA fit index. Structural Equation Modeling, 5(4), 411–419. doi:10.1080/10705519809540115
Steiger, J. H., & Lind, J. C. (1980, June). Statistically based tests for the number of factors. Paper presented at the annual meeting of the Psychometric Society, Iowa City, IA.
Jak, S., Jorgensen, T. D., Verdam, M. G., Oort, F. J., & Elffers, L. (2021). Analytical power calculations for structural equation modeling: A tutorial and Shiny app. Behavior Research Methods, 53, 1385–1406. doi:10.3758/s13428-020-01479-0
See Also
-
plotRMSEAdist()
to visualize the RMSEA distributions -
findRMSEApower()
to find the statistical power based on population RMSEA given a sample size -
findRMSEAsamplesize()
to find the minium sample size for a given statistical power based on population RMSEA
Examples
plotRMSEApower(rmsea0 = .025, rmseaA = .075, df = 23,
nlow = 100, nhigh = 500, steps = 10)
Plot power of nested model RMSEA
Description
Plot power of nested model RMSEA over a range of possible sample sizes.
Usage
plotRMSEApowernested(rmsea0A = NULL, rmsea0B = NULL, rmsea1A,
rmsea1B = NULL, dfA, dfB, nlow, nhigh, steps = 1, alpha = 0.05,
group = 1, ...)
Arguments
rmsea0A |
The |
rmsea0B |
The |
rmsea1A |
The |
rmsea1B |
The |
dfA |
degree of freedom of the more-restricted model |
dfB |
degree of freedom of the less-restricted model |
nlow |
Lower bound of sample size |
nhigh |
Upper bound of sample size |
steps |
Step size |
alpha |
The alpha level |
group |
The number of group in calculating RMSEA |
... |
The additional arguments for the plot function. |
Author(s)
Bell Clinton
Pavel Panko (Texas Tech University; pavel.panko@ttu.edu)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between nested covariance structure models: Power analysis and null hypotheses. Psychological Methods, 11(1), 19–35. doi:10.1037/1082-989X.11.1.19
See Also
-
findRMSEApowernested()
to find the power for a given sample size in nested model comparison based on population RMSEA -
findRMSEAsamplesizenested()
to find the minium sample size for a given statistical power in nested model comparison based on population RMSEA
Examples
plotRMSEApowernested(rmsea0A = 0, rmsea0B = 0, rmsea1A = 0.06,
rmsea1B = 0.05, dfA = 22, dfB = 20, nlow = 50,
nhigh = 500, steps = 1, alpha = .05, group = 1)
Combine sampling variability with parcel-allocation variability by pooling results across M parcel-allocations
Description
This function employs an iterative algorithm to pick the number of random
item-to-parcel allocations needed to meet user-defined stability criteria
for a fitted structural equation model (SEM) (see Details below for
more information). Pooled point and standard-error estimates from this SEM
can be outputted at this final selected number of allocations (however, it
is more efficient to save the allocations and treat them as multiple
imputations using lavaan.mi::lavaan.mi()
; see See Also for links with
examples). Additionally, new indices (see Sterba & Rights, 2016) are
outputted for assessing the relative contributions of parcel-allocation
variability vs. sampling variability in each estimate. At each iteration,
this function generates a given number of random item-to-parcel allocations,
fits a SEM to each allocation, pools estimates across allocations from that
iteration, and then assesses whether stopping criteria are met. If stopping
criteria are not met, the algorithm increments the number of allocations
used (generating all new allocations).
Usage
poolMAlloc(nPerPar, facPlc, nAllocStart, nAllocAdd = 0,
parceloutput = NULL, syntax, dataset, stopProp, stopValue,
selectParam = NULL, indices = "default", double = FALSE,
checkConv = FALSE, names = "default", leaveout = 0,
useTotalAlloc = FALSE, ...)
Arguments
nPerPar |
A list in which each element is a vector, corresponding to each factor, indicating sizes of parcels. If variables are left out of parceling, they should not be accounted for here (i.e., there should not be parcels of size "1"). |
facPlc |
A list of vectors, each corresponding to a factor, specifying the item indicators of that factor (whether included in parceling or not). Either variable names or column numbers. Variables not listed will not be modeled or included in output datasets. |
nAllocStart |
The number of random allocations of items to parcels to generate in the first iteration of the algorithm. |
nAllocAdd |
The number of allocations to add with each iteration of the
algorithm. Note that if only one iteration is desired, |
parceloutput |
Optional |
syntax |
lavaan syntax that defines the model. |
dataset |
Item-level dataset |
stopProp |
Value used in defining stopping criteria of the algorithm
( |
stopValue |
Value used in defining stopping criteria of the algorithm
( |
selectParam |
(Optional) A list of the pooled parameters to be used in
defining stopping criteria (i.e., |
indices |
Optional |
double |
(Optional) If set to |
checkConv |
(Optional) If set to TRUE, function will output pooled estimates and standard errors from 10 iterations post-convergence. |
names |
(Optional) A character vector containing the names of parceled variables. |
leaveout |
(Optional) A vector of variables to be left out of randomized parceling. Either variable names or column numbers are allowed. |
useTotalAlloc |
(Optional) If set to |
... |
Additional arguments to be passed to
|
Details
This function implements an algorithm for choosing the number of allocations (M; described in Sterba & Rights, 2016), pools point and standard-error estimates across these M allocations, and produces indices for assessing the relative contributions of parcel-allocation variability vs. sampling variability in each estimate.
To obtain pooled test statistics for model fit or model comparison, the
list
or parcel allocations can be passed to lavaan.mi::lavaan.mi()
(find Examples on the help pages for parcelAllocation()
and PAVranking()
).
This function randomly generates a given number (nAllocStart
) of
item-to-parcel allocations, fits a SEM to each allocation, and then
increments the number of allocations used (by nAllocAdd
) until the
pooled point and standard-error estimates fulfill stopping criteria
(stopProp
and stopValue
, defined above). A summary of results
from the model that was fit to the M allocations are returned.
Additionally, this function outputs the proportion of allocations with solutions that converged (using a maximum likelihood estimator) as well as the proportion of allocations with solutions that were converged and proper. The converged and proper solutions among the final M allocations are used in computing pooled results.
Additionally, after each iteration of the algorithm, information useful in monitoring the algorithm is outputted. The number of allocations used at that iteration, the proportion of pooled parameter estimates meeting stopping criteria at the previous iteration, the proportion of pooled standard errors meeting stopping criteria at the previous iteration, and the runtime of that iteration are outputted. When stopping criteria are satisfied, the full set of results are outputted.
For further details on the benefits of the random allocation of items to parcels, see Sterba (2011) and Sterba & MacCallum (2010).
Value
Estimates |
A table containing pooled results across M
allocations at the iteration where stopping criteria were met. Columns
correspond to individual parameter name, pooled estimate, pooled standard
error, p value for a z test of the parameter, normal-theory |
Fit |
A table containing results related to model fit from the M allocations at the iteration where stopping criteria were met. Columns correspond to fit index names, the mean of each index across allocations, the SD of each fit index across allocations, the minimum, maximum and range of each fit index across allocations, and the percent of the M allocations where the chi-square test of absolute fit was significant. |
Proportions |
A table containing the proportion of the final M allocations that (a) met the optimizer convergence criteria) and (b) converged to proper solutions. Note that pooled estimates, pooled standard errors, and other results are computed using only the converged, proper allocations. |
Stability |
The number of allocations (M) needed for stability, at which point the algorithm's stopping criteria (defined above) were met. |
Uncertainty |
Indices used to quantify uncertainty in estimates due to sample vs. allocation variability. A table containing individual parameter names, an estimate of the proportion of total variance of a pooled parameter estimate that is attributable to parcel-allocation variability (PPAV), and an estimate of the ratio of the between-allocation variance of a pooled parameter estimate to the within-allocation variance (RPAV). See Sterba & Rights (2016) for more detail. |
Time |
The total runtime of the function, in minutes. Note that the
total runtime will be greater when the specified model encounters
convergence problems for some allocations, as is the case with the
|
Author(s)
Jason D. Rights (Vanderbilt University; jason.d.rights@vanderbilt.edu)
The author would also like to credit Corbin Quick and Alexander Schoemann
for providing the original parcelAllocation()
function (prior to its
revision by Terrence D. Jorgensen) on which this function is based.
References
Sterba, S. K. (2011). Implications of parcel-allocation variability for comparing fit of item-solutions and parcel-solutions. Structural Equation Modeling, 18(4), 554–577. doi:10.1080/10705511.2011.607073
Sterba, S. K., & MacCallum, R. C. (2010). Variability in parameter estimates and model fit across random allocations of items to parcels. Multivariate Behavioral Research, 45(2), 322–358. doi:10.1080/00273171003680302
Sterba, S. K., & Rights, J. D. (2016). Accounting for parcel-allocation variability in practice: Combining sources of uncertainty and choosing the number of allocations. Multivariate Behavioral Research, 51(2–3), 296–313. doi:10.1080/00273171.2016.1144502
Sterba, S. K., & Rights, J. D. (2017). Effects of parceling on model selection: Parcel-allocation variability in model ranking. Psychological Methods, 22(1), 47–68. doi:10.1037/met0000067
See Also
lavaan.mi::lavaan.mi()
for treating allocations as multiple imputations
to pool results across allocations. See Examples on help pages for
parcelAllocation()
(when fitting a single model) and PAVranking()
(when comparing 2 models).
Examples
## lavaan syntax: A 2 Correlated
## factor CFA model to be fit to parceled data
parmodel <- '
f1 =~ NA*p1f1 + p2f1 + p3f1
f2 =~ NA*p1f2 + p2f2 + p3f2
p1f1 ~ 1
p2f1 ~ 1
p3f1 ~ 1
p1f2 ~ 1
p2f2 ~ 1
p3f2 ~ 1
p1f1 ~~ p1f1
p2f1 ~~ p2f1
p3f1 ~~ p3f1
p1f2 ~~ p1f2
p2f2 ~~ p2f2
p3f2 ~~ p3f2
f1 ~~ 1*f1
f2 ~~ 1*f2
f1 ~~ f2
'
## specify items for each factor
f1name <- colnames(simParcel)[1:9]
f2name <- colnames(simParcel)[10:18]
## run function
poolMAlloc(nPerPar = list(c(3,3,3), c(3,3,3)),
facPlc = list(f1name, f2name), nAllocStart = 10, nAllocAdd = 10,
syntax = parmodel, dataset = simParcel, stopProp = .03,
stopValue = .03, selectParam = c(1:6, 13:18, 21),
names = list("p1f1","p2f1","p3f1","p1f2","p2f2","p3f2"),
double = FALSE, useTotalAlloc = FALSE)
## See examples on ?parcelAllocation and ?PAVranking for how to obtain
## pooled test statistics and other pooled lavaan output.
## Details provided in Sterba & Rights (2016).
Probing two-way interaction on the no-centered or mean-centered latent interaction
Description
Probing interaction for simple intercept and simple slope for the no-centered or mean-centered latent two-way interaction
Usage
probe2WayMC(fit, nameX, nameY, modVar, valProbe, group = 1L,
omit.imps = c("no.conv", "no.se"))
Arguments
fit |
A fitted lavaan::lavaan or lavaan.mi::lavaan.mi object with a latent 2-way interaction. |
nameX |
|
nameY |
The name of factor that is used as the dependent variable. |
modVar |
The name of factor that is used as a moderator. The effect of
the other independent factor will be probed at each value of the
moderator variable listed in |
valProbe |
The values of the moderator that will be used to probe the effect of the focal predictor. |
group |
In multigroup models, the label of the group for which the
results will be returned. Must correspond to one of
|
omit.imps |
|
Details
Before using this function, researchers need to make the products of the
indicators between the first-order factors using mean centering (Marsh, Wen,
& Hau, 2004). Note that the double-mean centering may not be appropriate for
probing interaction if researchers are interested in simple intercepts. The
mean or double-mean centering can be done by the indProd()
function. The indicator products can be made for all possible combination or
matched-pair approach (Marsh et al., 2004). Next, the hypothesized model
with the regression with latent interaction will be used to fit all original
indicators and the product terms. See the example for how to fit the product
term below. Once the lavaan result is obtained, this function will be used
to probe the interaction.
Let that the latent interaction model regressing the dependent variable
(Y
) on the independent variable (X
) and the moderator (Z
)
be
Y = b_0 + b_1X + b_2Z + b_3XZ + r,
where b_0
is the
estimated intercept or the expected value of Y
when both X
and
Z
are 0, b_1
is the effect of X
when Z
is 0,
b_2
is the effect of Z
when X
is 0, b_3
is the
interaction effect between X
and Z
, and r
is the residual
term.
To probe a two-way interaction, the simple intercept of the independent variable at each value of the moderator (Aiken & West, 1991; Cohen, Cohen, West, & Aiken, 2003; Preacher, Curran, & Bauer, 2006) can be obtained by
b_{0|X = 0, Z} = b_0 + b_2 Z.
The simple slope of the independent varaible at each value of the moderator can be obtained by
b_{X|Z} = b_1 + b_3 Z.
The variance of the simple intercept formula is
Var\left(b_{0|X = 0, Z}\right) =
Var\left(b_0\right) + 2Z \times Cov\left(b_0, b_2\right) +
Z^2 \times Var\left(b_2\right)
,
where Var
denotes the variance of a parameter
estimate and Cov
denotes the covariance of two parameter estimates.
The variance of the simple slope formula is
Var\left(b_{X|Z}\right) = Var\left(b_1\right) + 2Z \times
Cov\left(b_1, b_3\right) + Z^2 \times Var\left(b_3\right)
Wald z statistic is used for test statistic (even for objects of class lavaan.mi::lavaan.mi).
Value
A list with two elements:
-
SimpleIntercept
: The simple intercepts given each value of the moderator. -
SimpleSlope
: The simple slopes given each value of the moderator.
In each element, the first column represents the values of the moderator
specified in the valProbe
argument. The second column is the simple
intercept or simple slope. The third column is the SE of the simple
intercept or simple slope. The fourth column is the Wald (z)
statistic, and the fifth column is the associated p value testing
the null hypothesis that each simple intercept or slope is 0.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Tutorial:
Schoemann, A. M., & Jorgensen, T. D. (2021). Testing and interpreting
latent variable interactions using the semTools
package.
Psych, 3(3), 322–335. doi:10.3390/psych3030024
Background literature:
Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). New York, NY: Routledge.
Marsh, H. W., Wen, Z., & Hau, K. T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9(3), 275–300. doi:10.1037/1082-989X.9.3.275
Preacher, K. J., Curran, P. J., & Bauer, D. J. (2006). Computational tools for probing interactions in multiple linear regression, multilevel modeling, and latent curve analysis. Journal of Educational and Behavioral Statistics, 31(4), 437–448. doi:10.3102/10769986031004437
See Also
-
indProd()
For creating the indicator products with no centering, mean centering, double-mean centering, or residual centering. -
probe3WayMC()
For probing the three-way latent interaction when the results are obtained from mean-centering, or double-mean centering -
probe2WayRC()
For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
probe3WayRC()
For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
plotProbe()
Plot the simple intercepts and slopes of the latent interaction.
Examples
dat2wayMC <- indProd(dat2way, 1:3, 4:6) # double mean centered by default
model1 <- "
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f12 =~ x1.x4 + x2.x5 + x3.x6
f3 =~ x7 + x8 + x9
f3 ~ f1 + f2 + f12
f12 ~~ 0*f1 + 0*f2 # not necessary, but implied by double mean centering
"
fitMC2way <- sem(model1, data = dat2wayMC, meanstructure = TRUE)
summary(fitMC2way)
probe2WayMC(fitMC2way, nameX = c("f1", "f2", "f12"), nameY = "f3",
modVar = "f2", valProbe = c(-1, 0, 1))
## can probe multigroup models, one group at a time
dat2wayMC$g <- 1:2
model2 <- "
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f12 =~ x1.x4 + x2.x5 + x3.x6
f3 =~ x7 + x8 + x9
f3 ~ c(b1.g1, b1.g2)*f1 + c(b2.g1, b2.g2)*f2 + c(b12.g1, b12.g2)*f12
f12 ~~ 0*f1 + 0*f2
"
fit2 <- sem(model2, data = dat2wayMC, group = "g")
probe2WayMC(fit2, nameX = c("f1", "f2", "f12"), nameY = "f3",
modVar = "f2", valProbe = c(-1, 0, 1)) # group = 1 by default
probe2WayMC(fit2, nameX = c("f1", "f2", "f12"), nameY = "f3",
modVar = "f2", valProbe = c(-1, 0, 1), group = 2)
Probing two-way interaction on the residual-centered latent interaction
Description
Probing interaction for simple intercept and simple slope for the residual-centered latent two-way interaction (Geldhof et al., 2013)
Usage
probe2WayRC(fit, nameX, nameY, modVar, valProbe, group = 1L,
omit.imps = c("no.conv", "no.se"))
Arguments
fit |
A fitted lavaan::lavaan or lavaan.mi::lavaan.mi object with a latent 2-way interaction. |
nameX |
|
nameY |
The name of factor that is used as the dependent variable. |
modVar |
The name of factor that is used as a moderator. The effect of
the other independent factor will be probed at each value of the
moderator variable listed in |
valProbe |
The values of the moderator that will be used to probe the effect of the focal predictor. |
group |
In multigroup models, the label of the group for which the
results will be returned. Must correspond to one of
|
omit.imps |
|
Details
Before using this function, researchers need to make the products of the
indicators between the first-order factors and residualize the products by
the original indicators (Lance, 1988; Little, Bovaird, & Widaman, 2006). The
process can be automated by the indProd()
function. Note that
the indicator products can be made for all possible combination or
matched-pair approach (Marsh et al., 2004). Next, the hypothesized model
with the regression with latent interaction will be used to fit all original
indicators and the product terms. To use this function the model must be fit
with a mean structure. See the example for how to fit the product term
below. Once the lavaan result is obtained, this function will be used to
probe the interaction.
The probing process on residual-centered latent interaction is based on
transforming the residual-centered result into the no-centered result. See
Geldhof et al. (2013) for further details. Note that this approach is based
on a strong assumption that the first-order latent variables are normally
distributed. The probing process is applied after the no-centered result
(parameter estimates and their covariance matrix among parameter estimates)
has been computed. See the probe2WayMC()
for further details.
Value
A list with two elements:
-
SimpleIntercept
: The simple intercepts given each value of the moderator. -
SimpleSlope
: The simple slopes given each value of the moderator.
In each element, the first column represents the values of the moderators
specified in the valProbe
argument. The second column is the simple
intercept or simple slope. The third column is the standard error of the
simple intercept or slope. The fourth column is the Wald (z)
statistic, and the fifth column is the associated p value testing
the null hypothesis that each simple intercept or slope is 0.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Tutorial:
Schoemann, A. M., & Jorgensen, T. D. (2021). Testing and interpreting
latent variable interactions using the semTools
package.
Psych, 3(3), 322–335. doi:10.3390/psych3030024
Background literature:
Lance, C. E. (1988). Residual centering, exploratory and confirmatory moderator analysis, and decomposition of effects in path models containing interactions. Applied Psychological Measurement, 12(2), 163–175. doi:10.1177/014662168801200205
Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of orthogonalizing powered and product terms: Implications for modeling interactions. Structural Equation Modeling, 13(4), 497–519. doi:10.1207/s15328007sem1304_1
Marsh, H. W., Wen, Z., & Hau, K. T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9(3), 275–300. doi:10.1037/1082-989X.9.3.275
Geldhof, G. J., Pornprasertmanit, S., Schoemann, A. M., & Little, T. D. (2013). Orthogonalizing through residual centering: Extended applications and caveats. Educational and Psychological Measurement, 73(1), 27–46. doi:10.1177/0013164412445473
See Also
-
indProd()
For creating the indicator products with no centering, mean centering, double-mean centering, or residual centering. -
probe2WayMC()
For probing the two-way latent interaction when the results are obtained from mean-centering, or double-mean centering -
probe3WayMC()
For probing the three-way latent interaction when the results are obtained from mean-centering, or double-mean centering -
probe3WayRC()
For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
plotProbe()
Plot the simple intercepts and slopes of the latent interaction.
Examples
dat2wayRC <- orthogonalize(dat2way, 1:3, 4:6)
model1 <- "
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f12 =~ x1.x4 + x2.x5 + x3.x6
f3 =~ x7 + x8 + x9
f3 ~ f1 + f2 + f12
f12 ~~ 0*f1 + 0*f2
x1 + x4 + x1.x4 + x7 ~ 0*1 # identify latent means
f1 + f2 + f12 + f3 ~ NA*1
"
fitRC2way <- sem(model1, data = dat2wayRC, meanstructure = TRUE)
summary(fitRC2way)
probe2WayRC(fitRC2way, nameX = c("f1", "f2", "f12"), nameY = "f3",
modVar = "f2", valProbe = c(-1, 0, 1))
## can probe multigroup models, one group at a time
dat2wayRC$g <- 1:2
model2 <- "
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f12 =~ x1.x4 + x2.x5 + x3.x6
f3 =~ x7 + x8 + x9
f3 ~ c(b1.g1, b1.g2)*f1 + c(b2.g1, b2.g2)*f2 + c(b12.g1, b12.g2)*f12
f12 ~~ 0*f1 + 0*f2
x1 + x4 + x1.x4 + x7 ~ 0*1 # identify latent means
f1 + f2 + f12 ~ NA*1
f3 ~ NA*1 + c(b0.g1, b0.g2)*1
"
fit2 <- sem(model2, data = dat2wayRC, group = "g")
probe2WayRC(fit2, nameX = c("f1", "f2", "f12"), nameY = "f3",
modVar = "f2", valProbe = c(-1, 0, 1)) # group = 1 by default
probe2WayRC(fit2, nameX = c("f1", "f2", "f12"), nameY = "f3",
modVar = "f2", valProbe = c(-1, 0, 1), group = 2)
Probing three-way interaction on the no-centered or mean-centered latent interaction
Description
Probing interaction for simple intercept and simple slope for the no-centered or mean-centered latent two-way interaction
Usage
probe3WayMC(fit, nameX, nameY, modVar, valProbe1, valProbe2, group = 1L,
omit.imps = c("no.conv", "no.se"))
Arguments
fit |
A fitted lavaan::lavaan or lavaan.mi::lavaan.mi object with a latent 2-way interaction. |
nameX |
|
nameY |
The name of factor that is used as the dependent variable. |
modVar |
The name of two factors that are used as the moderators. The effect of the independent factor will be probed at each combination of the moderator variables' chosen values. |
valProbe1 |
The values of the first moderator that will be used to probe the effect of the independent factor. |
valProbe2 |
The values of the second moderator that will be used to probe the effect of the independent factor. |
group |
In multigroup models, the label of the group for which the
results will be returned. Must correspond to one of
|
omit.imps |
|
Details
Before using this function, researchers need to make the products of the
indicators between the first-order factors using mean centering (Marsh, Wen,
& Hau, 2004). Note that the double-mean centering may not be appropriate for
probing interaction if researchers are interested in simple intercepts. The
mean or double-mean centering can be done by the indProd()
function. The indicator products can be made for all possible combination or
matched-pair approach (Marsh et al., 2004). Next, the hypothesized model
with the regression with latent interaction will be used to fit all original
indicators and the product terms. See the example for how to fit the product
term below. Once the lavaan result is obtained, this function will be used
to probe the interaction.
Let that the latent interaction model regressing the dependent variable
(Y
) on the independent variable (X
) and two moderators (Z
and W
) be
Y = b_0 + b_1X + b_2Z + b_3W + b_4XZ + b_5XW + b_6ZW
+ b_7XZW + r,
where b_0
is the estimated intercept or the expected
value of Y
when X
, Z
, and W
are 0, b_1
is the
effect of X
when Z
and W
are 0, b_2
is the effect of
Z
when X
and W
is 0, b_3
is the effect of W
when X
and Z
are 0, b_4
is the interaction effect between
X
and Z
when W
is 0, b_5
is the interaction effect
between X
and W
when Z
is 0, b_6
is the interaction
effect between Z
and W
when X
is 0, b_7
is the
three-way interaction effect between X
, Z
, and W
, and
r
is the residual term.
To probe a three-way interaction, the simple intercept of the independent variable at the specific values of the moderators (Aiken & West, 1991) can be obtained by
b_{0|X = 0, Z, W} = b_0 + b_2Z + b_3W + b_6ZW.
The simple slope of the independent variable at the specific values of the moderators can be obtained by
b_{X|Z, W} = b_1 + b_3Z + b_4W + b_7ZW.
The variance of the simple intercept formula is
Var\left(b_{0|X = 0, Z, W}\right) =
Var\left(b_0\right) + Z^2Var\left(b_2\right) + W^2Var\left(b_3\right) +
Z^2W^2Var\left(b_6\right)
+ 2ZCov\left(b_0, b_2\right) + 2WCov\left(b_0, b_3\right) +
2ZWCov\left(b_0, b_6\right) + 2ZWCov\left(b_2, b_3\right) +
2Z^2WCov\left(b_2, b_6\right) + 2ZW^2Cov\left(b_3, b_6\right),
where Var
denotes the variance of a parameter estimate and Cov
denotes the covariance of two parameter estimates.
The variance of the simple slope formula is
Var\left(b_{X|Z, W}\right) =
Var\left(b_1\right) + Z^2Var\left(b_4\right) + W^2Var\left(b_5\right)
+ Z^2W^2Var\left(b_7\right)
+ 2ZCov\left(b_1, b_4\right) + 2WCov\left(b_1, b_5\right) +
2ZWCov\left(b_1, b_7\right) + 2ZWCov\left(b_4, b_5\right) +
2Z^2WCov\left(b_4, b_7\right) + 2ZW^2Cov\left(b_5, b_7\right).
Wald z statistics are calculated (even for objects of class lavaan.mi::lavaan.mi) to test null hypotheses that simple intercepts or slopes are 0.
Value
A list with two elements:
-
SimpleIntercept
: The model-implied intercepts given each combination of moderator values. -
SimpleSlope
: The model-implied slopes given each combination of moderator values.
In each element, the first column represents values of the first moderator
specified in the valProbe1
argument. The second column represents
values of the second moderator specified in the valProbe2
argument.
The third column is the simple intercept or simple slope. The fourth column
is the standard error of the simple intercept or simple slope. The fifth
column is the Wald (z) statistic, and the sixth column is its
associated p value to test the null hypothesis that each simple
intercept or simple slope equals 0.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Tutorial:
Schoemann, A. M., & Jorgensen, T. D. (2021). Testing and interpreting
latent variable interactions using the semTools
package.
Psych, 3(3), 322–335. doi:10.3390/psych3030024
Background literature:
Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage.
Marsh, H. W., Wen, Z., & Hau, K. T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9(3), 275–300. doi:10.1037/1082-989X.9.3.275
See Also
-
indProd()
For creating the indicator products with no centering, mean centering, double-mean centering, or residual centering. -
probe2WayMC()
For probing the two-way latent interaction when the results are obtained from mean-centering, or double-mean centering -
probe2WayRC()
For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
probe3WayRC()
For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
plotProbe()
Plot the simple intercepts and slopes of the latent interaction.
Examples
dat3wayMC <- indProd(dat3way, 1:3, 4:6, 7:9)
model3 <- " ## define latent variables
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f3 =~ x7 + x8 + x9
## 2-way interactions
f12 =~ x1.x4 + x2.x5 + x3.x6
f13 =~ x1.x7 + x2.x8 + x3.x9
f23 =~ x4.x7 + x5.x8 + x6.x9
## 3-way interaction
f123 =~ x1.x4.x7 + x2.x5.x8 + x3.x6.x9
## outcome variable
f4 =~ x10 + x11 + x12
## latent regression model
f4 ~ b1*f1 + b2*f2 + b3*f3 + b12*f12 + b13*f13 + b23*f23 + b123*f123
## orthogonal terms among predictors
## (not necessary, but implied by double mean centering)
f1 ~~ 0*f12 + 0*f13 + 0*f123
f2 ~~ 0*f12 + 0*f23 + 0*f123
f3 ~~ 0*f13 + 0*f23 + 0*f123
f12 + f13 + f23 ~~ 0*f123
"
fitMC3way <- sem(model3, data = dat3wayMC, meanstructure = TRUE)
summary(fitMC3way)
probe3WayMC(fitMC3way, nameX = c("f1" ,"f2" ,"f3",
"f12","f13","f23", # this order matters!
"f123"), # 3-way interaction
nameY = "f4", modVar = c("f1", "f2"),
valProbe1 = c(-1, 0, 1), valProbe2 = c(-1, 0, 1))
Probing three-way interaction on the residual-centered latent interaction
Description
Probing interaction for simple intercept and simple slope for the residual-centered latent three-way interaction (Geldhof et al., 2013)
Usage
probe3WayRC(fit, nameX, nameY, modVar, valProbe1, valProbe2, group = 1L,
omit.imps = c("no.conv", "no.se"))
Arguments
fit |
A fitted lavaan::lavaan or lavaan.mi::lavaan.mi object with a latent 2-way interaction. |
nameX |
|
nameY |
The name of factor that is used as the dependent variable. |
modVar |
The name of two factors that are used as the moderators. The effect of the independent factor on each combination of the moderator variable values will be probed. |
valProbe1 |
The values of the first moderator that will be used to probe the effect of the independent factor. |
valProbe2 |
The values of the second moderator that will be used to probe the effect of the independent factor. |
group |
In multigroup models, the label of the group for which the
results will be returned. Must correspond to one of
|
omit.imps |
|
Details
Before using this function, researchers need to make the products of the
indicators between the first-order factors and residualize the products by
the original indicators (Lance, 1988; Little, Bovaird, & Widaman, 2006). The
process can be automated by the indProd()
function. Note that
the indicator products can be made for all possible combination or
matched-pair approach (Marsh et al., 2004). Next, the hypothesized model
with the regression with latent interaction will be used to fit all original
indicators and the product terms (Geldhof et al., 2013). To use this
function the model must be fit with a mean structure. See the example for
how to fit the product term below. Once the lavaan result is obtained, this
function will be used to probe the interaction.
The probing process on residual-centered latent interaction is based on
transforming the residual-centered result into the no-centered result. See
Geldhof et al. (2013) for further details. Note that this approach based on
a strong assumption that the first-order latent variables are normally
distributed. The probing process is applied after the no-centered result
(parameter estimates and their covariance matrix among parameter estimates)
has been computed. See the probe3WayMC()
for further details.
Value
A list with two elements:
-
SimpleIntercept
: The model-implied intercepts given each combination of moderator values. -
SimpleSlope
: The model-implied slopes given each combination of moderator values.
In each element, the first column represents values of the first moderator
specified in the valProbe1
argument. The second column represents
values of the second moderator specified in the valProbe2
argument.
The third column is the simple intercept or simple slope. The fourth column
is the SE of the simple intercept or simple slope. The fifth column
is the Wald (z) statistic, and the sixth column is its associated
p value to test the null hypothesis that each simple intercept or
simple slope equals 0.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Tutorial:
Schoemann, A. M., & Jorgensen, T. D. (2021). Testing and interpreting
latent variable interactions using the semTools
package.
Psych, 3(3), 322–335. doi:10.3390/psych3030024
Background literature:
Geldhof, G. J., Pornprasertmanit, S., Schoemann, A., & Little, T. D. (2013). Orthogonalizing through residual centering: Extended applications and caveats. Educational and Psychological Measurement, 73(1), 27–46. doi:10.1177/0013164412445473
Lance, C. E. (1988). Residual centering, exploratory and confirmatory moderator analysis, and decomposition of effects in path models containing interactions. Applied Psychological Measurement, 12(2), 163–175. doi:10.1177/014662168801200205
Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of orthogonalizing powered and product terms: Implications for modeling interactions. Structural Equation Modeling, 13(4), 497–519. doi:10.1207/s15328007sem1304_1
Marsh, H. W., Wen, Z., & Hau, K. T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9(3), 275–300. doi:10.1037/1082-989X.9.3.275
Pornprasertmanit, S., Schoemann, A. M., Geldhof, G. J., & Little, T. D. (submitted). Probing latent interaction estimated with a residual centering approach.
See Also
-
indProd()
For creating the indicator products with no centering, mean centering, double-mean centering, or residual centering. -
probe2WayMC()
For probing the two-way latent interaction when the results are obtained from mean-centering, or double-mean centering -
probe3WayMC()
For probing the three-way latent interaction when the results are obtained from mean-centering, or double-mean centering -
probe2WayRC()
For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
plotProbe()
Plot the simple intercepts and slopes of the latent interaction.
Examples
dat3wayRC <- orthogonalize(dat3way, 1:3, 4:6, 7:9)
model3 <- " ## define latent variables
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f3 =~ x7 + x8 + x9
## 2-way interactions
f12 =~ x1.x4 + x2.x5 + x3.x6
f13 =~ x1.x7 + x2.x8 + x3.x9
f23 =~ x4.x7 + x5.x8 + x6.x9
## 3-way interaction
f123 =~ x1.x4.x7 + x2.x5.x8 + x3.x6.x9
## outcome variable
f4 =~ x10 + x11 + x12
## latent regression model
f4 ~ b1*f1 + b2*f2 + b3*f3 + b12*f12 + b13*f13 + b23*f23 + b123*f123
## orthogonal terms among predictors
f1 ~~ 0*f12 + 0*f13 + 0*f123
f2 ~~ 0*f12 + 0*f23 + 0*f123
f3 ~~ 0*f13 + 0*f23 + 0*f123
f12 + f13 + f23 ~~ 0*f123
## identify latent means
x1 + x4 + x7 + x1.x4 + x1.x7 + x4.x7 + x1.x4.x7 + x10 ~ 0*1
f1 + f2 + f3 + f12 + f13 + f23 + f123 + f4 ~ NA*1
"
fitRC3way <- sem(model3, data = dat3wayRC, meanstructure = TRUE)
summary(fitRC3way)
probe3WayMC(fitRC3way, nameX = c("f1" ,"f2" ,"f3",
"f12","f13","f23", # this order matters!
"f123"), # 3-way interaction
nameY = "f4", modVar = c("f1", "f2"),
valProbe1 = c(-1, 0, 1), valProbe2 = c(-1, 0, 1))
Quark
Description
The quark
function provides researchers with the ability to calculate
and include component scores calculated by taking into account the variance
in the original dataset and all of the interaction and polynomial effects of
the data in the dataset.
Usage
quark(data, id, order = 1, silent = FALSE, ...)
Arguments
data |
The data frame is a required component for |
id |
Identifiers and dates within the dataset will need to be
acknowledged as |
order |
Order is an optional argument provided by quark that can be
used when the imputation procedures in mice fail. Under some circumstances,
mice cannot calculate missing values due to issues with extreme missingness.
Should an error present itself stating a failure due to not having any
columns selected, set the argument |
silent |
If |
... |
additional arguments to pass to |
Details
The quark
function calculates these component scores by first filling
in the data via means of multiple imputation methods and then expanding the
dataset by aggregating the non-overlapping interaction effects between
variables by calculating the mean of the interactions and polynomial
effects. The multiple imputation methods include one of iterative sampling
and group mean substitution and multiple imputation using a polytomous
regression algorithm (mice). During the expansion process, the dataset is
expanded to three times its normal size (in width). The first third of the
dataset contains all of the original data post imputation, the second third
contains the means of the polynomial effects (squares and cubes), and the
final third contains the means of the non-overlapping interaction effects. A
full principal componenent analysis is conducted and the individual
components are retained. The subsequent combinequark()
function
provides researchers the control in determining how many components to
extract and retain. The function returns the dataset as submitted (with
missing values) and the component scores as requested for a more accurate
multiple imputation in subsequent steps.
Value
The output value from using the quark function is a list. It will return a list with 7 components.
ID Columns |
Is a vector of the identifier columns entered when running quark. |
ID Variables |
Is a subset of the dataset that contains the identifiers as acknowledged when running quark. |
Used Data |
Is a matrix / dataframe of the data provided by user as the basis for quark to process. |
Imputed Data |
Is a matrix / dataframe of the data after the multiple method imputation process. |
Big Matrix |
Is the expanded product and polynomial matrix. |
Principal Components |
Is the entire dataframe of principal components for the dataset. This dataset will have the same number of rows of the big matrix, but will have 1 less column (as is the case with principal component analyses). |
Percent Variance Explained |
Is a vector of the percent variance explained with each column of principal components. |
Author(s)
Steven R. Chesnut (University of Southern Mississippi; Steven.Chesnut@usm.edu)
Danny Squire (Texas Tech University)
Terrence D. Jorgensen (University of Amsterdam)
The PCA code is copied and modified from the FactoMineR
package.
References
Howard, W. J., Rhemtulla, M., & Little, T. D. (2015). Using Principal Components as Auxiliary Variables in Missing Data Estimation. Multivariate Behavioral Research, 50(3), 285–299. doi:10.1080/00273171.2014.999267
See Also
Examples
set.seed(123321)
dat <- HolzingerSwineford1939[,7:15]
misspat <- matrix(runif(nrow(dat) * 9) < 0.3, nrow(dat))
dat[misspat] <- NA
dat <- cbind(HolzingerSwineford1939[,1:3], dat)
quark.list <- quark(data = dat, id = c(1, 2))
final.data <- combinequark(quark = quark.list, percent = 80)
## Example to rerun quark after imputation failure:
quark.list <- quark(data = dat, id = c(1, 2), order = 2)
Composite Reliability using SEM
Description
Calculate composite reliability from estimated factor-model parameters
Usage
reliability(object, what = c("alpha", "omega", "omega2", "omega3", "ave"),
return.total = FALSE, dropSingle = TRUE, omit.factors = character(0),
omit.indicators = character(0), omit.imps = c("no.conv", "no.se"))
Arguments
object |
A lavaan::lavaan or lavaan.mi::lavaan.mi object, expected to contain only exogenous common factors (i.e., a CFA model). |
what |
|
return.total |
|
dropSingle |
|
omit.factors |
|
omit.indicators |
|
omit.imps |
|
Details
The coefficient alpha (Cronbach, 1951) can be calculated by
\alpha = \frac{k}{k - 1}\left[ 1 - \frac{\sum^{k}_{i = 1}
\sigma_{ii}}{\sum^{k}_{i = 1} \sigma_{ii} + 2\sum_{i < j} \sigma_{ij}}
\right],
where k
is the number of items in a factor, \sigma_{ii}
is the
item i observed variances, \sigma_{ij}
is the observed
covariance of items i and j.
Several coefficients for factor-analysis reliability have been termed
"omega", which Cho (2021) argues is a misleading misnomer and argues for
using \rho
to represent them all, differentiated by descriptive
subscripts. In our package, we number \omega
based on commonly
applied calculations. Bentler (1968) first introduced factor-analysis
reliability for a unidimensional factor model with congeneric indicators.
However, assuming there are no cross-loadings in a multidimensional CFA,
this reliability coefficient can be calculated for each factor in the model.
\omega_1 =\frac{\left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
Var\left( \psi \right)}{\left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
Var\left( \psi \right) + \sum^{k}_{i = 1} \theta_{ii} + 2\sum_{i < j}
\theta_{ij} },
where \lambda_i
is the factor loading of item i, \psi
is
the factor variance, \theta_{ii}
is the variance of measurement errors
of item i, and \theta_{ij}
is the covariance of measurement
errors from item i and j. McDonald (1999) later referred to
this and other reliability coefficients as "omega", which is a source
of confusion when reporting coefficients (Cho, 2021).
The additional coefficients generalize the first formula by accounting for
multidimenisionality (possibly with cross-loadings) and correlated errors.
By setting return.total=TRUE
, one can estimate reliability for a
single composite calculated using all indicators in the multidimensional
CFA (Bentler, 1972, 2009). "omega2"
is calculated by
\omega_2 = \frac{\left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
Var\left( \psi \right)}{\bold{1}^\prime \hat{\Sigma} \bold{1}},
where \hat{\Sigma}
is the model-implied covariance matrix, and
\bold{1}
is the k
-dimensional vector of 1. The first and the
second coefficients omega will have the same value per factor in models with
simple structure, but they differ when there are (e.g.) cross-loadings
or method factors. The first coefficient omega can be viewed as the
reliability controlling for the other factors (like \eta^2_{partial}
in
ANOVA). The second coefficient omega can be viewed as the unconditional
reliability (like \eta^2
in ANOVA).
The "omega3"
coefficient (McDonald, 1999), sometimes referred to as
hierarchical omega, can be calculated by
\omega_3 =\frac{\left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
Var\left( \psi \right)}{\bold{1}^\prime \Sigma \bold{1}},
where \Sigma
is the observed covariance matrix. If the model fits the
data well, \omega_3
will be similar to \omega_2
. Note that if
there is a directional effect in the model, all coefficients are calculated
from total factor variances: lavInspect(object, "cov.lv")
.
In conclusion, \omega_1
, \omega_2
, and \omega_3
are
different in the denominator. The denominator of the first formula assumes
that a model is congeneric factor model where measurement errors are not
correlated. The second formula accounts for correlated measurement errors.
However, these two formulas assume that the model-implied covariance matrix
explains item relationships perfectly. The residuals are subject to sampling
error. The third formula use observed covariance matrix instead of
model-implied covariance matrix to calculate the observed total variance.
This formula is the most conservative method in calculating coefficient
omega.
The average variance extracted (AVE) can be calculated by
AVE = \frac{\bold{1}^\prime
\textrm{diag}\left(\Lambda\Psi\Lambda^\prime\right)\bold{1}}{\bold{1}^\prime
\textrm{diag}\left(\hat{\Sigma}\right) \bold{1}},
Note that this formula is modified from Fornell & Larcker (1981) in the case that factor variances are not 1. The proposed formula from Fornell & Larcker (1981) assumes that the factor variances are 1. Note that AVE will not be provided for factors consisting of items with dual loadings. AVE is the property of items but not the property of factors. AVE is calculated with polychoric correlations when ordinal indicators are used.
Coefficient alpha is by definition applied by treating indicators as numeric
(see Chalmers, 2018), which is consistent with the alpha
function in
the psych
package. When indicators are ordinal, reliability
additionally applies the standard alpha calculation to the polychoric
correlation matrix to return Zumbo et al.'s (2007) "ordinal alpha".
Coefficient omega for categorical items is calculated using Green and Yang's
(2009, formula 21) approach. Three types of coefficient omega indicate
different methods to calculate item total variances. The original formula
from Green and Yang is equivalent to \omega_3
in this function.
Green and Yang did not propose a method for
calculating reliability with a mixture of categorical and continuous
indicators, and we are currently unaware of an appropriate method.
Therefore, when reliability
detects both categorical and continuous
indicators of a factor, an error is returned. If the categorical indicators
load on a different factor(s) than continuous indicators, then reliability
will still be calculated separately for those factors, but
return.total
must be FALSE
(unless omit.factors
is used
to isolate factors with indicators of the same type).
Value
Reliability values (coefficient alpha, coefficients omega, average
variance extracted) of each factor in each group. If there are multiple
factors, a total
column can optionally be included.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Yves Rosseel (Ghent University; Yves.Rosseel@UGent.be)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Bentler, P. M. (1972). A lower-bound method for the dimension-free measurement of internal consistency. Social Science Research, 1(4), 343–357. doi:10.1016/0049-089X(72)90082-8
Bentler, P. M. (2009). Alpha, dimension-free, and model-based internal consistency reliability. Psychometrika, 74(1), 137–143. doi:10.1007/s11336-008-9100-1
Chalmers, R. P. (2018). On misconceptions and the limited usefulness of ordinal alpha. Educational and Psychological Measurement, 78(6), 1056–1071. doi:10.1177/0013164417727036
Cho, E. (2021) Neither Cronbach’s alpha nor McDonald’s omega: A commentary on Sijtsma and Pfadt. Psychometrika, 86(4), 877–886. doi:10.1007/s11336-021-09801-1
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. doi:10.1007/BF02310555
Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models with unobservable variables and measurement errors. Journal of Marketing Research, 18(1), 39–50. doi:10.2307/3151312
Green, S. B., & Yang, Y. (2009). Reliability of summed item scores using structural equation modeling: An alternative to coefficient alpha. Psychometrika, 74(1), 155–167. doi:10.1007/s11336-008-9099-3
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.
Raykov, T. (2001). Estimation of congeneric scale reliability using covariance structure analysis with nonlinear constraints British Journal of Mathematical and Statistical Psychology, 54(2), 315–323. doi:10.1348/000711001159582
Zumbo, B. D., Gadermann, A. M., & Zeisser, C. (2007). Ordinal versions of coefficients alpha and theta for Likert rating scales. Journal of Modern Applied Statistical Methods, 6(1), 21–29. doi:10.22237/jmasm/1177992180
Zumbo, B. D., & Kroc, E. (2019). A measurement is a choice and Stevens’ scales of measurement do not help make it: A response to Chalmers. Educational and Psychological Measurement, 79(6), 1184–1197. doi:10.1177/0013164419844305
See Also
Examples
data(HolzingerSwineford1939)
HS9 <- HolzingerSwineford1939[ , c("x7","x8","x9")]
HSbinary <- as.data.frame( lapply(HS9, cut, 2, labels=FALSE) )
names(HSbinary) <- c("y7","y8","y9")
HS <- cbind(HolzingerSwineford1939, HSbinary)
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ y7 + y8 + y9 '
fit <- cfa(HS.model, data = HS, ordered = c("y7","y8","y9"), std.lv = TRUE)
## works for factors with exclusively continuous OR categorical indicators
reliability(fit)
## reliability for ALL indicators only available when they are
## all continuous or all categorical
reliability(fit, omit.factors = "speed", return.total = TRUE)
## loop over visual indicators to calculate alpha if one indicator is removed
for (i in paste0("x", 1:3)) {
cat("Drop x", i, ":\n")
print(reliability(fit, omit.factors = c("textual","speed"),
omit.indicators = i, what = "alpha"))
}
## works for multigroup models and for multilevel models (and both)
data(Demo.twolevel)
## assign clusters to arbitrary groups
Demo.twolevel$g <- ifelse(Demo.twolevel$cluster %% 2L, "type1", "type2")
model2 <- ' group: type1
level: within
fac =~ y1 + L2*y2 + L3*y3
level: between
fac =~ y1 + L2*y2 + L3*y3
group: type2
level: within
fac =~ y1 + L2*y2 + L3*y3
level: between
fac =~ y1 + L2*y2 + L3*y3
'
fit2 <- sem(model2, data = Demo.twolevel, cluster = "cluster", group = "g")
reliability(fit2, what = c("alpha","omega3"))
Calculate the reliability values of a second-order factor
Description
Calculate the reliability values (coefficient omega) of a second-order factor
Usage
reliabilityL2(object, secondFactor, omit.imps = c("no.conv", "no.se"))
Arguments
object |
A lavaan::lavaan or lavaan.mi::lavaan.mi object, expected to contain a least one exogenous higher-order common factor. |
secondFactor |
The name of a single second-order factor in the
model fitted in |
omit.imps |
|
Details
The first formula of the coefficient omega (in the
reliability()
) will be mainly used in the calculation. The
model-implied covariance matrix of a second-order factor model can be
separated into three sources: the second-order common-factor variance,
the residual variance of the first-order common factors (i.e., not
accounted for by the second-order factor), and the measurement error of
observed indicators:
\hat{\Sigma} = \Lambda \bold{B} \Phi_2 \bold{B}^{\prime}
\Lambda^{\prime} + \Lambda \Psi_{u} \Lambda^{\prime} + \Theta,
where \hat{\Sigma}
is the model-implied covariance matrix,
\Lambda
contains first-order factor loadings, \bold{B}
contains
second-order factor loadings, \Phi_2
is the covariance matrix of the
second-order factor(s), \Psi_{u}
is the covariance matrix of residuals
from first-order factors, and \Theta
is the covariance matrix of the
measurement errors from observed indicators. Thus, we can calculate the
proportion of variance of a composite score calculated from the observed
indicators (e.g., a total score or scale mean) that is attributable to the
second-order factor, i.e. coefficient omega at Level 1:
\omega_{L1} = \frac{\bold{1}^{\prime} \Lambda \bold{B} \Phi_2
\bold{B}^{\prime} \Lambda^{\prime} \bold{1}}{\bold{1}^{\prime} \Lambda
\bold{B} \Phi_2 \bold{B} ^{\prime} \Lambda^{\prime} \bold{1} +
\bold{1}^{\prime} \Lambda \Psi_{u} \Lambda^{\prime} \bold{1} +
\bold{1}^{\prime} \Theta \bold{1}},
where \bold{1}
is the k-dimensional vector of 1 and k is
the number of observed variables.
The model-implied covariance matrix among first-order factors (\Phi_1
)
can be calculated as:
\Phi_1 = \bold{B} \Phi_2 \bold{B}^{\prime} + \Psi_{u},
Thus, the proportion of variance among first-order common factors that is attributable to the second-order factor (i.e., coefficient omega at Level 2) can be calculated as:
\omega_{L2} = \frac{\bold{1}_F^{\prime} \bold{B} \Phi_2
\bold{B}^{\prime} \bold{1}_F}{\bold{1}_F^{\prime} \bold{B} \Phi_2
\bold{B}^{\prime} \bold{1}_F + \bold{1}_F^{\prime} \Psi_{u} \bold{1}_F},
where \bold{1}_F
is the F-dimensional vector of 1 and F
is the number of first-order factors. This Level-2 omega can be interpreted
as an estimate of the reliability of a hypothetical composite calculated
from error-free observable variables representing the first-order common
factors. This might only be meaningful as a thought experiment.
An additional thought experiment is possible: If the observed indicators contained only the second-order common-factor variance and unsystematic measurement error, then there would be no first-order common factors because their unique variances would be excluded from the observed measures. An estimate of this hypothetical composite reliability can be calculated as the partial coefficient omega at Level 1, or the proportion of observed variance explained by the second-order factor after partialling out the uniqueness from the first-order factors:
\omega_{L1} = \frac{\bold{1}^{\prime} \Lambda \bold{B} \Phi_2
\bold{B}^{\prime} \Lambda^{\prime} \bold{1}}{\bold{1}^{\prime} \Lambda
\bold{B} \Phi_2 \bold{B}^{\prime} \Lambda^{\prime} \bold{1} +
\bold{1}^{\prime} \Theta \bold{1}},
Note that if the second-order factor has a direct factor loading on some observed variables, the observed variables will be counted as first-order factors, which might not be desirable.
Value
Reliability values at Levels 1 and 2 of the second-order factor, as well as the partial reliability value at Level 1
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
See Also
Examples
HS.model3 <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
higher =~ visual + textual + speed'
fit6 <- cfa(HS.model3, data = HolzingerSwineford1939)
reliability(fit6) # Should provide a warning for the endogenous variables
reliabilityL2(fit6, "higher")
Residual-center all target indicators by covariates
Description
This function will regress target variables on the covariate and replace the target variables by the residual of the regression analysis. This procedure is useful to control the covariate from the analysis model (Geldhof, Pornprasertmanit, Schoemann, & Little, 2013).
Usage
residualCovariate(data, targetVar, covVar)
Arguments
data |
The desired data to be transformed. |
targetVar |
Varible names or the position of indicators that users wish to be residual centered (as dependent variables) |
covVar |
Covariate names or the position of the covariates using for residual centering (as independent variables) onto target variables |
Value
The data that the target variables replaced by the residuals
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Geldhof, G. J., Pornprasertmanit, S., Schoemann, A. M., & Little, T. D. (2013). Orthogonalizing through residual centering: Extended applications and caveats. Educational and Psychological Measurement, 73(1), 27–46. doi:10.1177/0013164412445473
See Also
indProd()
For creating the indicator products with no
centering, mean centering, double-mean centering, or residual centering.
Examples
dat <- residualCovariate(attitude, 2:7, 1)
Implement orthogonal or oblique rotation
Description
These functions will implement orthogonal or oblique rotation on standardized factor loadings from a lavaan output.
Usage
orthRotate(object, method = "varimax", ...)
oblqRotate(object, method = "quartimin", ...)
funRotate(object, fun, ...)
Arguments
object |
A lavaan output |
method |
The method of rotations, such as |
fun |
The name of the function that users wish to rotate the
standardized solution. The functions must take the first argument as the
standardized loading matrix and return the |
... |
Additional arguments for the |
Details
These functions will rotate the unrotated standardized factor loadings by
orthogonal rotation using the GPForth
function or
oblique rotation using the GPFoblq
function the
GPArotation
package. The resulting rotation matrix will be used to
calculate standard errors of the rotated standardized factor loading by
delta method by numerically computing the Jacobian matrix by the
lav_func_jacobian_simple
function.
Value
An linkS4class{EFA}
object that saves the rotated EFA solution
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
See Also
Examples
unrotated <- efaUnrotate(HolzingerSwineford1939, nf = 3,
varList = paste0("x", 1:9), estimator = "mlr")
# Orthogonal varimax
out.varimax <- orthRotate(unrotated, method = "varimax")
summary(out.varimax, sort = FALSE, suppress = 0.3)
# Orthogonal Quartimin
orthRotate(unrotated, method = "quartimin")
# Oblique Quartimin
oblqRotate(unrotated, method = "quartimin")
# Geomin
oblqRotate(unrotated, method = "geomin")
# Target rotation
library(GPArotation)
target <- matrix(0, 9, 3)
target[1:3, 1] <- NA
target[4:6, 2] <- NA
target[7:9, 3] <- NA
colnames(target) <- c("factor1", "factor2", "factor3")
## This function works with GPArotation version 2012.3-1
funRotate(unrotated, fun = "targetQ", Target = target)
Fit a lavaan Model to Multiple Imputed Data Sets
Description
This function fits a lavaan model to a list of imputed data sets, and can
also implement multiple imputation for a single data.frame
with
missing observations, using either the Amelia package or the mice package.
Usage
runMI(model, data, fun = "lavaan", ...,
m, miArgs = list(), miPackage = "Amelia", seed = 12345)
lavaan.mi(model, data, ...,
m, miArgs = list(), miPackage = "Amelia", seed = 12345)
cfa.mi(model, data, ...,
m, miArgs = list(), miPackage = "Amelia", seed = 12345)
sem.mi(model, data, ...,
m, miArgs = list(), miPackage = "Amelia", seed = 12345)
growth.mi(model, data, ...,
m, miArgs = list(), miPackage = "Amelia", seed = 12345)
Arguments
model |
The analysis model can be specified using
|
data |
A |
fun |
|
... |
additional arguments to pass to |
m |
|
miArgs |
Addition arguments for the multiple-imputation function
( |
miPackage |
Package to be used for imputation. Currently these
functions only support |
seed |
|
Value
A OLDlavaan.mi object
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.
See Also
Examples
## See the new lavaan.mi package
Deprecated functions in package semTools.
Description
The functions listed below are deprecated and will be defunct
in the near future. When possible, alternative functions with similar
functionality are also mentioned. Help pages for deprecated functions are
available at help("semTools-deprecated")
.
Usage
efaUnrotate(data = NULL, nf, varList = NULL, start = TRUE, aux = NULL,
...)
orthRotate(object, method = "varimax", ...)
oblqRotate(object, method = "quartimin", ...)
funRotate(object, fun, ...)
longInvariance(model, varList, auto = "all", constrainAuto = FALSE,
fixed.x = TRUE, std.lv = FALSE, group = NULL, group.equal = "",
group.partial = "", strict = FALSE, warn = TRUE, debug = FALSE,
quiet = FALSE, fit.measures = "default", baseline.model = NULL,
method = "satorra.bentler.2001", ...)
measurementInvariance(..., std.lv = FALSE, strict = FALSE, quiet = FALSE,
fit.measures = "default", baseline.model = NULL,
method = "satorra.bentler.2001")
measurementInvarianceCat(..., std.lv = FALSE, strict = FALSE,
quiet = FALSE, fit.measures = "default", baseline.model = NULL,
method = "default")
reliability(object, what = c("alpha", "omega", "omega2", "omega3", "ave"),
return.total = FALSE, dropSingle = TRUE, omit.factors = character(0),
omit.indicators = character(0), omit.imps = c("no.conv", "no.se"))
reliabilityL2(object, secondFactor, omit.imps = c("no.conv", "no.se"))
lavTestLRT.mi(object, h1 = NULL, test = c("D3", "D2"),
omit.imps = c("no.conv", "no.se"), asymptotic = FALSE,
pool.robust = FALSE, ...)
lavTestWald.mi(object, constraints = NULL, test = c("D1", "D2"),
asymptotic = FALSE, scale.W = !asymptotic, omit.imps = c("no.conv",
"no.se"), verbose = FALSE, warn = TRUE)
modindices.mi(object, test = c("D2", "D1"), omit.imps = c("no.conv",
"no.se"), standardized = TRUE, cov.std = TRUE,
information = "expected", power = FALSE, delta = 0.1, alpha = 0.05,
high.power = 0.75, sort. = FALSE, minimum.value = 0,
maximum.number = nrow(LIST), na.remove = TRUE, op = NULL)
modificationIndices.mi(object, test = c("D2", "D1"),
omit.imps = c("no.conv", "no.se"), standardized = TRUE, cov.std = TRUE,
information = "expected", power = FALSE, delta = 0.1, alpha = 0.05,
high.power = 0.75, sort. = FALSE, minimum.value = 0,
maximum.number = nrow(LIST), na.remove = TRUE, op = NULL)
modificationindices.mi(object, test = c("D2", "D1"),
omit.imps = c("no.conv", "no.se"), standardized = TRUE, cov.std = TRUE,
information = "expected", power = FALSE, delta = 0.1, alpha = 0.05,
high.power = 0.75, sort. = FALSE, minimum.value = 0,
maximum.number = nrow(LIST), na.remove = TRUE, op = NULL)
lavTestScore.mi(object, add = NULL, release = NULL, test = c("D2", "D1"),
scale.W = !asymptotic, omit.imps = c("no.conv", "no.se"),
asymptotic = is.null(add), univariate = TRUE, cumulative = FALSE,
epc = FALSE, standardized = epc, cov.std = epc, verbose = FALSE,
warn = TRUE, information = "expected")
runMI(model, data, fun = "lavaan", ..., m, miArgs = list(),
miPackage = "Amelia", seed = 12345)
lavaan.mi(model, data, ..., m, miArgs = list(), miPackage = "Amelia",
seed = 12345)
cfa.mi(model, data, ..., m, miArgs = list(), miPackage = "Amelia",
seed = 12345)
sem.mi(model, data, ..., m, miArgs = list(), miPackage = "Amelia",
seed = 12345)
growth.mi(model, data, ..., m, miArgs = list(), miPackage = "Amelia",
seed = 12345)
calculate.D2(w, DF = 0L, asymptotic = FALSE)
Support functions for exploratory factor analysis in lavaan
The efaUnrotate
, orthRotate
, oblqRotate
, and
funRotate
functions will no longer be supported. These functions
allowed users to estimate EFA parameters with lavaan
. Instead, users
can now directly use lavaan
's efa()
function.
Exploratory SEM (ESEM) is also supported by lavaan using special operators
in lavaan
's model.syntax
; see
https://github.com/yrosseel/lavaan/issues/112 for details.
Previous measurement-invariance functions
The measurementInvariance
, measurementInvarianceCat
, and
longInvariance
functions will no longer be supported. Instead, use
the measEq.syntax()
function, which is much more flexible and
supports a wider range of data (e.g., any mixture of numeric
and
ordered
indicators, any combination of multiple groups and repeated
measures, models fit to multiple imputations with lavaan.mi::lavaan.mi()
).
Reliability
The original reliability
function was suboptimally designed.
For example, AVE was returned, which is not a reliability index. Also,
alpha and several omega-type coefficients were returned, including the
original formula that was in appropriate for models with complex structure.
Some features could be controlled by the user for one but not both types of
index For example, alpha for categorical indicators was returned on both
the observed and latent-response scales, but this was not an option for any
omega-type indices. The omegas differed in terms of whether the observed or
model-implied covariance matrix was used in the denominator, but alpha was
only computed using the observed matrix. These inconsistencies have been
resolved in the new compRelSEM()
function, which returns only
one reliability index (per factor, optionally total score) according to the
user's requested features, for which there is much more flexibility.
Average variance extracted is now available in a dedicated AVE()
function.
Higher-Order Reliability
Originally, composite reliability of a single higher-order factor was
estimated in a separate function: reliabilityL2
. It is now available
for multiple higher-order factors in the same model, and from the same
compRelSEM()
function that estimates reliability for first-order
factors, using the higher=
argument to name higher-order factors in
the fitted model.
runMI()
and the lavaan.mi
class functionality
The runMI()
function and support for lavaan.mi-class
objects became
such a large part of semTools that it made sense to move that functionality
to its own package. The lavaan.mi package is now available for users
to fit lavaan
models to their multiply imputed data. The new package
already fixes many bugs and provides many new features that make the
semTools OLDlavaan.mi-class
obsolete. Please immediately discontinue
your dependence on semTools::runMI()
amd transition to the new
lavaan.mi package, which also provides a more similar user interface
as the lavaan package provides for a single data=
set. The README
on https://github.com/TDJorgensen/lavaan.mi provides a list of analogous
functionality in lavaan and lavaan.mi, and the NEWS file
documents new features and other differences from the deprecated
semTools::runMI()
functionality.
Simulated Data set to Demonstrate Random Allocations of Parcels
Description
A simulated data set with 2 factors with 9 indicators for each factor
Usage
simParcel
Format
A data.frame
with 800 observations of 18 variables.
- f1item1
Item 1 loading on factor 1
- f1item2
Item 2 loading on factor 1
- f1item3
Item 3 loading on factor 1
- f1item4
Item 4 loading on factor 1
- f1item5
Item 5 loading on factor 1
- f1item6
Item 6 loading on factor 1
- f1item7
Item 7 loading on factor 1
- f1item8
Item 8 loading on factor 1
- f1item9
Item 9 loading on factor 1
- f2item1
Item 1 loading on factor 2
- f2item2
Item 2 loading on factor 2
- f2item3
Item 3 loading on factor 2
- f2item4
Item 4 loading on factor 2
- f2item5
Item 5 loading on factor 2
- f2item6
Item 6 loading on factor 2
- f2item7
Item 7 loading on factor 2
- f2item8
Item 8 loading on factor 2
- f2item9
Item 9 loading on factor 2
Source
Data were generated using the simsem
package.
Examples
head(simParcel)
Single Parameter Test Divided from Nested Model Comparison
Description
In comparing two nested models, \Delta\chi^2
test may indicate that
two models are different. However, like other omnibus tests, researchers do
not know which fixed parameters or constraints make these two models
different. This function will help researchers identify the significant
parameter.
Usage
singleParamTest(model1, model2, return.fit = FALSE,
method = "satorra.bentler.2001")
Arguments
model1 |
Model 1. |
model2 |
Model 2. Note that two models must be nested models. Further, the order of parameters in their parameter tables are the same. That is, nested models with different scale identifications may not be able to test by this function. |
return.fit |
Return the submodels fitted by this function |
method |
The method used to calculate likelihood ratio test. See
|
Details
This function first identifies the differences between these two models. The model with more free parameters is referred to as parent model and the model with fewer free parameters is referred to as nested model. Two tests are implemented here:
-
free
: The nested model is used as a template. Then, one parameter indicating the differences between two models is freed. The new model is compared with the nested model. This process is repeated for all differences between two models. fix
: The parent model is used as a template. Then, one parameter indicating the differences between two models is fixed or constrained to be equal to other parameters. The new model is then compared with the parent model. This process is repeated for all differences between two models.mi
: No longer available because the test of modification indices is not consistent. For example, if two parameters are equality constrained, the modification index from the first parameter is not equal to the second parameter.
Note that this function does not adjust for the inflated Type I error rate from multiple tests.
Value
If return.fit = FALSE
, the result tables are provided.
\chi^2
and p value are provided for all methods. Note that the
\chi^2
is all based on 1 df. Expected parameter changes
and their standardized forms are also provided.
If return.fit = TRUE
, a list with two elements are provided. The
first element is the tabular result. The second element is the submodels
used in the free
and fix
methods.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Examples
library(lavaan)
# Nested model comparison by hand
HS.model1 <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6'
HS.model2 <- ' visual =~ a*x1 + a*x2 + a*x3
textual =~ b*x4 + b*x5 + b*x6'
m1 <- cfa(HS.model1, data = HolzingerSwineford1939, std.lv = TRUE,
estimator = "MLR")
m2 <- cfa(HS.model2, data = HolzingerSwineford1939, std.lv = TRUE,
estimator = "MLR")
anova(m1, m2)
singleParamTest(m1, m2)
## Nested model comparison from the measurementInvariance function
HW.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
models <- measurementInvariance(model = HW.model, data = HolzingerSwineford1939,
group = "school")
singleParamTest(models[[1]], models[[2]])
## Note that the comparison between metric (Model 2) and scalar invariance
## (Model 3) cannot be done by this function because the metric invariance
## model fixes factor means as 0 in Group 2 but the strong invariance model
## frees the factor means in Group 2. Users may use this function to compare
## scalar invariance (Model 3) to a homogeneous-means model.
Finding skewness
Description
Finding skewness (g_{1}
) of an object
Usage
skew(object, population = FALSE)
Arguments
object |
A vector used to find a skewness |
population |
|
Details
The skewness computed by default is g_{1}
, the third standardized
moment of the empirical distribution of object
.
The population parameter skewness \gamma_{1}
formula is
\gamma_{1} = \frac{\mu_{3}}{\mu^{3/2}_{2}},
where \mu_{i}
denotes the i
order central moment.
The skewness formula for sample statistic g_{1}
is
g_{1} = \frac{k_{3}}{k^{2}_{2}},
where k_{i}
are the i
order k-statistic.
The standard error of the skewness is
Var(\hat{g}_1) = \frac{6}{N}
where N
is the sample size.
Value
A value of a skewness with a test statistic if the population is
specified as FALSE
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Weisstein, Eric W. (n.d.). Skewness. Retrived from MathWorld–A Wolfram Web Resource: http://mathworld.wolfram.com/Skewness.html
See Also
-
kurtosis()
Find the univariate excessive kurtosis of a variable -
mardiaSkew()
Find Mardia's multivariate skewness of a set of variables -
mardiaKurtosis()
Find the Mardia's multivariate kurtosis of a set of variables
Examples
skew(1:5)
Randomly Split a Data Set into Halves
Description
This function randomly splits a data set into two halves, and saves the resulting data sets to the same folder as the original.
Usage
splitSample(dataset, path = "default", div = 2, type = "default",
name = "splitSample")
Arguments
dataset |
The original data set to be divided. Can be a file path to a
*.csv or *.dat file (headers will automatically be detected) or an R object
(matrix or dataframe). (Windows users: file path must be specified using
FORWARD SLASHES ( |
path |
File path to folder for output data sets. NOT REQUIRED if dataset is a filename. Specify ONLY if dataset is an R object, or desired output folder is not that of original data set. If path is specified as "object", output data sets will be returned as a list, and not saved to hard drive. |
div |
Number of output data sets. NOT REQUIRED if default, 2 halves. |
type |
Output file format ("dat" or "csv"). NOT REQUIRED unless desired output formatting differs from that of input, or dataset is an R object and csv formatting is desired. |
name |
Output file name. NOT REQUIRED unless desired output name differs from that of input, or input dataset is an R object. (If input is an R object and name is not specified, name will be "splitSample".) |
Details
This function randomly orders the rows of a data set, divides the data set into two halves, and saves the halves to the same folder as the original data set, preserving the original formatting. Data set type (*.csv or .dat) and formatting (headers) are automatically detected, and output data sets will preserve input type and formatting unless specified otherwise. Input can be in the form of a file path (.dat or *.csv), or an R object (matrix or dataframe). If input is an R object and path is default, output data sets will be returned as a list object.
Value
If path = "object"
, list
of output data sets.
Otherwise, output will saved to hard drive in the same format as input.
Author(s)
Corbin Quick (University of Michigan; corbinq@umich.edu)
Examples
#### Input is .dat file
#splitSample("C:/Users/Default/Desktop/MYDATA.dat")
#### Output saved to "C:/Users/Default/Desktop/" in .dat format
#### Names are "MYDATA_s1.dat" and "MYDATA_s2.dat"
#### Input is R object
## Split C02 dataset from the datasets package
library(datasets)
splitMyData <- splitSample(CO2, path = "object")
summary(splitMyData[[1]])
summary(splitMyData[[2]])
#### Output object splitMyData becomes list of output data sets
#### Input is .dat file in "C:/" folder
#splitSample("C:/testdata.dat", path = "C:/Users/Default/Desktop/", type = "csv")
#### Output saved to "C:/Users/Default/Desktop/" in *.csv format
#### Names are "testdata_s1.csv" and "testdata_s2.csv"
#### Input is R object
#splitSample(myData, path = "C:/Users/Default/Desktop/", name = "splitdata")
#### Output saved to "C:/Users/Default/Desktop/" in *.dat format
#### Names are "splitdata_s1.dat" and "splitdata_s2.dat"
Tukey's WSD post-hoc test of means for unequal variance and sample size
Description
This function computes Tukey's WSD post hoc test of means when variances and sample sizes are not equal across groups. It can be used as a post hoc test when comparing latent means in multiple group SEM.
Usage
tukeySEM(m1, m2, var1, var2, n1, n2, ng)
Arguments
m1 |
Mean of group 1. |
m2 |
Mean of group 2. |
var1 |
Variance of group 1. |
var2 |
Variance of group 2. |
n1 |
Sample size of group 1. |
n2 |
Sample size of group 2. |
ng |
Total number of groups to be compared (i.e., the number of groups compared in the omnibus test). |
Details
After conducting an omnibus test of means across three of more groups, researchers often wish to know which sets of means differ at a particular Type I error rate. Tukey's WSD test holds the error rate stable across multiple comparisons of means. This function implements an adaptation of Tukey's WSD test from Maxwell & Delaney (2004), that allows variances and sample sizes to differ across groups.
Value
A vector with three elements:
-
q
: The q statistic -
df
: The degrees of freedom for the q statistic -
p
: A p value based on the q statistic, df, and the total number of groups to be compared
Author(s)
Alexander M. Schoemann (East Carolina University; schoemanna@ecu.edu)
References
Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison perspective (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Examples
## For a case where three groups have been compared:
## Group 1: mean = 3.91, var = 0.46, n = 246
## Group 2: mean = 3.96, var = 0.62, n = 465
## Group 3: mean = 2.94, var = 1.07, n = 64
## compare group 1 and group 2
tukeySEM(3.91, 3.96, 0.46, 0.62, 246, 425, 3)
## compare group 1 and group 3
tukeySEM(3.91, 2.94, 0.46, 1.07, 246, 64, 3)
## compare group 2 and group 3
tukeySEM(3.96, 2.94, 0.62, 1.07, 465, 64, 3)
Fit a lavaan model using 2-Stage Maximum Likelihood (TSML) estimation for missing data.
Description
This function automates 2-Stage Maximum Likelihood (TSML) estimation, optionally with auxiliary variables. Step 1 involves fitting a saturated model to the partially observed data set (to variables in the hypothesized model as well as auxiliary variables related to missingness). Step 2 involves fitting the hypothesized model to the model-implied means and covariance matrix (also called the "EM" means and covariance matrix) as if they were complete data. Step 3 involves correcting the Step-2 standard errors (SEs) and chi-squared statistic to account for additional uncertainty due to missing data (using information from Step 1; see References section for sources with formulas).
Usage
twostage(..., aux, fun, baseline.model = NULL)
lavaan.2stage(..., aux = NULL, baseline.model = NULL)
cfa.2stage(..., aux = NULL, baseline.model = NULL)
sem.2stage(..., aux = NULL, baseline.model = NULL)
growth.2stage(..., aux = NULL, baseline.model = NULL)
Arguments
... |
Arguments passed to the |
aux |
An optional character vector naming auxiliary variable(s) in
|
fun |
The character string naming the lavaan function used to fit the
Step-2 hypothesized model ( |
baseline.model |
An optional character string, specifying the lavaan
|
Details
All variables (including auxiliary variables) are treated as endogenous
varaibles in the Step-1 saturated model (fixed.x = FALSE
), so data
are assumed continuous, although not necessarily multivariate normal
(dummy-coded auxiliary variables may be included in Step 1, but categorical
endogenous variables in the Step-2 hypothesized model are not allowed). To
avoid assuming multivariate normality, request se = "robust.huber.white"
. CAUTION: In addition to setting fixed.x = FALSE
and conditional.x = FALSE
in lavaan::lavaan()
,
this function will automatically set meanstructure = TRUE
,
estimator = "ML"
, missing = "fiml"
, and test = "standard"
. lavaan::lavaan()
's se
option can only be
set to "standard"
to assume multivariate normality or to
"robust.huber.white"
to relax that assumption.
Value
The twostage object contains 3 fitted lavaan models (saturated, target/hypothesized, and baseline) as well as the names of auxiliary variables. None of the individual models provide the correct model results (except the point estimates in the target model are unbiased). Use the methods in twostage to extract corrected SEs and test statistics.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Savalei, V., & Bentler, P. M. (2009). A two-stage approach to missing data: Theory and application to auxiliary variables. Structural Equation Modeling, 16(3), 477–497. doi:10.1080/10705510903008238
Savalei, V., & Falk, C. F. (2014). Robust two-stage approach outperforms robust full information maximum likelihood with incomplete nonnormal data. Structural Equation Modeling, 21(2), 280–302. doi:10.1080/10705511.2014.882692
See Also
Examples
## impose missing data for example
HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
"ageyr","agemo","school")]
set.seed(12345)
HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
age <- HSMiss$ageyr + HSMiss$agemo/12
HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)
## specify CFA model from lavaan's ?cfa help page
HS.model <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
## use ageyr and agemo as auxiliary variables
out <- cfa.2stage(model = HS.model, data = HSMiss, aux = c("ageyr","agemo"))
## two versions of a corrected chi-squared test results are shown
out
## see Savalei & Bentler (2009) and Savalei & Falk (2014) for details
## the summary additionally provides the parameter estimates with corrected
## standard errors, test statistics, and confidence intervals, along with
## any other options that can be passed to parameterEstimates()
summary(out, standardized = TRUE)
## use parameter labels to fit a more constrained model
modc <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + a*x8 + a*x9
'
outc <- cfa.2stage(model = modc, data = HSMiss, aux = c("ageyr","agemo"))
## use the anova() method to test this constraint
anova(out, outc)
## like for a single model, two corrected statistics are provided
Class for the Results of 2-Stage Maximum Likelihood (TSML) Estimation for Missing Data
Description
This class contains the results of 2-Stage Maximum Likelihood (TSML)
estimation for missing data. The summary
, anova
, vcov
methods return corrected SEs and test statistics. Other methods are
simply wrappers around the corresponding lavaan::lavaan
methods.
Usage
## S4 method for signature 'twostage'
show(object)
## S4 method for signature 'twostage'
summary(object, ...)
## S4 method for signature 'twostage'
anova(object, h1 = NULL, baseline = FALSE)
## S4 method for signature 'twostage'
nobs(object, type = c("ntotal", "ngroups",
"n.per.group", "norig", "patterns", "coverage"))
## S4 method for signature 'twostage'
coef(object, type = c("free", "user"))
## S4 method for signature 'twostage'
vcov(object, baseline = FALSE)
## S4 method for signature 'twostage'
fitted.values(object, model = c("target", "saturated",
"baseline"), type = "moments", labels = TRUE)
## S4 method for signature 'twostage'
fitted(object, model = c("target", "saturated",
"baseline"), type = "moments", labels = TRUE)
## S4 method for signature 'twostage'
residuals(object, type = c("raw", "cor", "normalized",
"standardized"))
## S4 method for signature 'twostage'
resid(object, type = c("raw", "cor", "normalized",
"standardized"))
Arguments
object |
An object of class |
... |
arguments passed to |
h1 |
An object of class |
baseline |
|
type |
The meaning of this argument varies depending on which method it
it used for. Find detailed descriptions in the Value section
under |
model |
|
labels |
|
Value
show |
|
summary |
|
anova |
|
nobs |
|
coef |
|
vcov |
|
fitted.values , fitted |
|
residuals , resid |
|
Slots
saturated
A fitted lavaan::lavaan object containing the saturated model results
target
A fitted lavaan::lavaan object containing the target/hypothesized model results
baseline
A fitted lavaan::lavaan object containing the baseline/null model results
auxNames
A character string (potentially of
length == 0
) of any auxiliary variable names, if used
Objects from the Class
Objects can be created via the
twostage()
function.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
See Also
Examples
# See the example from the twostage function