Encoding: | UTF-8 |
Version: | 0.1-0 |
Title: | Fit Structural Equation Models to Multiply Imputed Data |
Description: | The primary purpose of 'lavaan.mi' is to extend the functionality of the R package 'lavaan', which implements structural equation modeling (SEM). When incomplete data have been multiply imputed, the imputed data sets can be analyzed by 'lavaan' using complete-data estimation methods, but results must be pooled across imputations (Rubin, 1987, <doi:10.1002/9780470316696>). The 'lavaan.mi' package automates the pooling of point and standard-error estimates, as well as a variety of test statistics, using a familiar interface that allows users to fit an SEM to multiple imputations as they would to a single data set using the 'lavaan' package. |
Depends: | R(≥ 4.0), lavaan(≥ 0.6-18), methods |
Imports: | stats, utils |
Suggests: | Amelia, MASS, mice, parallel, testthat |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
LazyData: | yes |
LazyLoad: | yes |
URL: | https://github.com/TDJorgensen/lavaan.mi |
BugReports: | https://github.com/TDJorgensen/lavaan.mi/issues |
Date: | 2025-03-07 |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-03-07 11:54:30 UTC; terrence |
Author: | Terrence D. Jorgensen
|
Maintainer: | Terrence D. Jorgensen <TJorgensen314@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-03-10 14:30:02 UTC |
List of imputed Holzinger & Swineford (1939) datasets
Description
A version of the classic Holzinger and Swineford (1939) dataset, with
missing data imposed on variables x5
and x9
:
Details
-
x5
is missing not at random (MNAR) by deleting the lowest 30% ofx5
values. -
x9
is missing at random (MAR) conditional on age, by deletingx5
values for the youngest 30% of subjects in the data.
The data are imputed 20 times using the syntax shown in the example.
The data include only age and school variables, along with 9 tests
(x1
through x9
).
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Source
The lavaan package.
References
Holzinger, K., & Swineford, F. (1939). A study in factor analysis: The stability of a bifactor solution. Supplementary Educational Monograph, no. 48. Chicago, IL: University of Chicago Press.
See Also
lavaan::HolzingerSwineford1939
Examples
data(HolzingerSwineford1939, package = "lavaan")
## impose missing data for example
HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
"ageyr","agemo","school")]
set.seed(123)
HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
age <- HSMiss$ageyr + HSMiss$agemo/12
HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)
## impute missing data with Amelia
library(Amelia)
set.seed(456)
HS.amelia <- amelia(HSMiss, m = 20, noms = "school", p2s = FALSE)
HS20imps <- HS.amelia$imputations
List of imputed Holzinger & Swineford (1939) dichotomized data
Description
A version of the classic Holzinger and Swineford (1939) dataset, with
missing data imposed on variables x5
and x9
:
Details
-
x5
is missing not at random (MNAR) by deleting the lowest 30% ofx5
values. -
x9
is missing at random (MAR) conditional on age, by deletingx9
values for the youngest 30% of subjects in the data.
The data are then dichotomized using a median split, and imputed 5 times
using the syntax shown in the example. The data include only the 9 tests
(x1
through x9
) and school.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Source
The lavaan package.
References
Holzinger, K., & Swineford, F. (1939). A study in factor analysis: The stability of a bifactor solution. Supplementary Educational Monograph, no. 48. Chicago, IL: University of Chicago Press.
See Also
lavaan::HolzingerSwineford1939
Examples
data(HolzingerSwineford1939, package = "lavaan")
## impose missing data for example
HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
"ageyr","agemo","school")]
set.seed(123)
HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
age <- HSMiss$ageyr + HSMiss$agemo/12
HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)
## median split
HSbinary <- as.data.frame( lapply(HSMiss[ , paste0("x", 1:9)],
FUN = cut, breaks = 2, labels = FALSE) )
HSbinary$school <- HSMiss$school
## impute binary missing data using mice package
library(mice)
set.seed(456)
miceImps <- mice(HSbinary)
## save imputations in a list of data.frames
binHS5imps <- list()
for (i in 1:miceImps$m) binHS5imps[[i]] <- complete(miceImps, action = i)
Calculate the "D2" statistic
Description
This is a utility function used to calculate the "D2" statistic for pooling
test statistics across multiple imputations. This function is called by
several functions used for lavaan.mi objects, such as
lavTestLRT.mi()
, lavTestWald.mi()
, and
lavTestScore.mi()
. But this function can be used for any general
scenario because it only requires a vector of \chi^2
statistics (one
from each imputation) and the degrees of freedom for the test statistic.
See Li, Meng, Raghunathan, & Rubin (1991) and Enders (2010, chapter 8) for
details about how it is calculated.
Usage
calculate.D2(w, DF = 0L, asymptotic = FALSE)
Arguments
w |
|
DF |
degrees of freedom (df) of the |
asymptotic |
|
Value
A numeric
vector containing the test statistic, df,
its p value, and 2 missing-data diagnostics: the relative invrease
in variance (RIV, or average for multiparameter tests: ARIV) and the
fraction missing information (FMI = ARIV / (1 + ARIV)).
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1(1), 65–92. Retrieved from https://www.jstor.org/stable/24303994
See Also
lavTestLRT.mi()
, lavTestWald.mi()
,
lavTestScore.mi()
Examples
## generate a vector of chi-squared values, just for example
DF <- 3 # degrees of freedom
M <- 20 # number of imputations
CHI <- rchisq(M, DF)
## pool the "results"
calculate.D2(CHI, DF) # by default, an F statistic is returned
calculate.D2(CHI, DF, asymptotic = TRUE) # asymptotically chi-squared
## generate standard-normal values, for an example of Wald z tests
Z <- rnorm(M)
calculate.D2(Z) # default DF = 0 will square Z to make chisq(DF = 1)
## F test is equivalent to a t test with the denominator DF
Covariance and Correlation Residuals
Description
This function calculates residuals for sample moments (e.g., means and (co)variances, means) from a lavaan model fitted to multiple imputed data sets, along with summary and inferential statistics about the residuals.
Usage
## S4 method for signature 'lavaan.mi'
residuals(object, type = "raw", omit.imps = c("no.conv", "no.se"), ...)
## S4 method for signature 'lavaan.mi'
resid(object, type = "raw", omit.imps = c("no.conv", "no.se"), ...)
lavResiduals.mi(object, omit.imps = c("no.conv", "no.se"), ...)
Arguments
object |
An object of class |
type |
|
omit.imps |
|
... |
Arguments passed to |
Value
A list
of residuals and other information (see lavaan::lavResiduals()
).
The standard residuals()
(and resid()
alias) method simply calls
lavResiduals.mi(..., zstat=FALSE, summary=FALSE)
.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
See Also
lavaan::lavResiduals()
for details about other arguments.
Examples
data(HS20imps) # import a list of 20 imputed data sets
## specify CFA model from lavaan's ?cfa help page
HS.model <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
## fit model to 20 imputed data sets
fit <- cfa.mi(HS.model, data = HS20imps)
## default type = "cor.bentler" (standardized covariance residuals)
lavResiduals.mi(fit, zstat = FALSE)
## SRMR is in the $summary
## correlation residuals
lavResiduals.mi(fit, zstat = FALSE, type = "cor")
## CRMR is in the $summary
## raw covariance residuals
lavResiduals.mi(fit, type = "raw") # zstat=TRUE by default
## RMR is in the $summary
## "normalized" residuals are in $cov.z
## The standard resid() and residuals() method simply call lavResiduals.mi()
## with arguments to display only the residuals ("raw" by default).
resid(fit)
residuals(fit, type = "cor.bollen") # same as type = "cor"
Likelihood Ratio Test for Multiple Imputations
Description
Likelihood ratio test (LRT) for lavaan models fitted to multiple imputed data sets.
Usage
lavTestLRT.mi(
object,
...,
modnames = NULL,
asANOVA = TRUE,
pool.method = c("D4", "D3", "D2"),
omit.imps = c("no.conv", "no.se"),
asymptotic = FALSE,
pool.robust = FALSE
)
## S4 method for signature 'lavaan.mi'
anova(object, ...)
Arguments
object |
An object of class lavaan.mi |
... |
Additional objects of class lavaan.mi, as
well as arguments passed to |
modnames |
Optional |
asANOVA |
|
pool.method |
Find additional details in Enders (2010, chapter 8). |
omit.imps |
|
asymptotic |
|
pool.robust |
|
Details
The "D2"
method is available using any estimator and test statistic.
When using a likelihood-based estimator, 2 additional methods are available
to pool the LRT.
The Meng & Rubin (1992) method, commonly referred to as
"D3"
. This method has many problems, discussed in Chan & Meng (2022).The Chan & Meng (2022) method, referred to as
"D4"
by Grund et al. (2023), resolves problems with"D3"
.
When "D2"
is not explicitly requested in situations it is the only
applicable method, (e.g., DWLS for categorical outcomes), users are notified
that pool.method
was set to "D2"
.
pool.method = "Mplus"
implies "D3"
and asymptotic = TRUE
(see Asparouhov & Muthen, 2010).
Note that the anova()
method simply calls lavTestLRT.mi()
.
Value
When
asANOVA=TRUE
, returns an object of class stats::anova with a a test of model fit for a single model (object
) or test(s) of the difference(s) in fit between nested models passed via...
(either anF
or\chi^2
statistic, depending on theasymptotic
argument), its degrees of freedom, its p value, and 2 missing-data diagnostics: the relative increase in variance (RIV = FMI / (1-
FMI)) and the fraction of missing information (FMI = RIV / (1 + RIV)).When
asANOVA=FALSE
, returns a vector containing the LRT statistic for a single model or comparison of a single pair of models, or adata.frame
of multiple model comparisons. Robust statistics will also include the average (across imputations) scaling factor and (if relevant) shift parameter(s), unlesspool.robust = TRUE
. When usingpool.method = "D3"
or"D4"
, the vector for a single model also includes its average log-likelihood and information criteria.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Based on source code for lavaan::lavTestLRT()
by Yves Rosseel.
References
Asparouhov, T., & Muthen, B. (2010). Chi-square statistics with multiple imputation. Technical Report. Retrieved from http://www.statmodel.com/
Chan, K. W., & Meng, X. L. (2022). Multiple improvements of multiple imputation likelihood ratio tests. Statistica Sinica, 32, 1489–1514. doi:10.5705/ss.202019.0314
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Grund, S., Lüdtke, O., & Robitzsch, A. (2023). Pooling methods for likelihood ratio tests in multiply imputed data sets. Psychological Methods, 28(5), 1207–1221. doi:10.1037/met0000556
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1(1), 65–92. Retrieved from https://www.jstor.org/stable/24303994
Meng, X.-L., & Rubin, D. B. (1992). Performing likelihood ratio tests with multiply-imputed data sets. Biometrika, 79(1), 103–111. doi:10.2307/2337151
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley. doi:10.1002/9780470316696
See Also
lavaan::lavTestLRT()
for arguments that can be passed via ...,
and use lavaan::fitMeasures()
to obtain fit indices calculated from pooled
test statistics.
Examples
data(HS20imps) # import a list of 20 imputed data sets
## specify CFA model from ?lavaan::cfa help page
HS.model <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
fit1 <- cfa.mi(HS.model, data = HS20imps, estimator = "mlm")
## By default, pool.method = "D4".
## Must request an asymptotic chi-squared statistic
## in order to accommodate a robust correction.
lavTestLRT.mi(fit1, asymptotic = TRUE)
## or anova(fit1, asymptotic = TRUE)
## Comparison with more constrained (nested) models: parallel indicators
HS.parallel <- '
visual =~ x1 + 1*x2 + 1*x3
textual =~ x4 + 1*x5 + 1*x6
speed =~ x7 + 1*x8 + 1*x9
'
fitp <- cfa.mi(HS.parallel, data = HS20imps, estimator = "mlm")
## Even more constrained model: orthogonal factors
fit0 <- cfa.mi(HS.parallel, data = HS20imps, estimator = "mlm",
orthogonal = TRUE)
## Compare 3 models, and pass the lavTestLRT(method=) argument
lavTestLRT.mi(fit1, fit0, fitp, asymptotic = TRUE,
method = "satorra.bentler.2010")
## For a single model, you can request a vector instead of an anova-class
## table in order to see naive information criteria (only using D3 or D4),
## which are calculated using the average log-likelihood across imputations.
lavTestLRT.mi(fit1, asANOVA = FALSE)
## When using a least-squares (rather than maximum-likelihood) estimator,
## only the D2 method is available. For example, ordered-categorical data:
data(binHS5imps) # import a list of 5 imputed data sets
## fit model using default DWLS estimation
fit1c <- cfa.mi(HS.model , data = binHS5imps, ordered = TRUE)
fit0c <- cfa.mi(HS.parallel, data = binHS5imps, ordered = TRUE,
orthogonal = TRUE)
## Using D2, you can either robustify the pooled naive statistic ...
lavTestLRT.mi(fit1c, fit0c, asymptotic = TRUE, pool.method = "D2")
## ... or pool the robust chi-squared statistic (NOT recommended)
lavTestLRT.mi(fit1c, fit0c, asymptotic = TRUE, pool.method = "D2",
pool.robust = TRUE)
## When calculating fit indices, you can pass lavTestLRT.mi() arguments:
fitMeasures(fit1c, output = "text",
# lavTestLRT.mi() arguments:
pool.method = "D2", pool.robust = TRUE)
Score Test for Multiple Imputations
Description
Score test (or "Lagrange multiplier" test) for lavaan models fitted to multiple imputed data sets. Statistics for releasing one or more fixed or constrained parameters in model can be calculated by pooling the gradient and information matrices pooled across imputed data sets in a method proposed by Mansolf, Jorgensen, & Enders (2020)—analogous to the "D1" Wald test proposed by Li, Meng, Raghunathan, & Rubin's (1991)—or by pooling the complete-data score-test statistics across imputed data sets (i.e., "D2"; Li et al., 1991).
Usage
lavTestScore.mi(
object,
add = NULL,
release = NULL,
pool.method = c("D2", "D1"),
scale.W = !asymptotic,
omit.imps = c("no.conv", "no.se"),
asymptotic = is.null(add),
univariate = TRUE,
cumulative = FALSE,
epc = FALSE,
standardized = epc,
cov.std = epc,
verbose = FALSE,
warn = TRUE,
information = "expected"
)
Arguments
object |
An object of class lavaan.mi. |
add |
Either a |
release |
Vector of |
pool.method |
|
scale.W |
|
omit.imps |
|
asymptotic |
|
univariate |
|
cumulative |
|
epc |
|
standardized |
If |
cov.std |
|
verbose |
|
warn |
|
information |
|
Value
A list containing at least one data.frame
:
$test
: The total score test, with columns for the score test statistic (X2
), its degrees of freedom (df
), its p value under the\chi^2
distribution (p.value
), and ifasymptotic=FALSE
, the average relative invrease in variance (ARIV) used to calculate the denominator df is also returned as a missing-data diagnostic, along with the fraction missing information (FMI = ARIV / (1 + ARIV)).$uni
: Optional (ifunivariate=TRUE
). Each 1-df score test, equivalent to modification indices. Also includes EPCs ifepc=TRUE
, and RIV and FMI ifasymptotic=FALSE
.$cumulative
: Optional (ifcumulative=TRUE
). Cumulative score tests, with ARIV and FMI ifasymptotic=FALSE
.$epc
: Optional (ifepc=TRUE
). Parameter estimates, expected parameter changes, and expected parameter values if ALL the tested constraints were freed.
See lavaan::lavTestScore()
for details.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Based on source code for lavaan::lavTestScore()
by Yves Rosseel
pool.method = "D1"
method proposed by
Maxwell Mansolf (University of California, Los Angeles;
mamansolf@gmail.com)
References
Bentler, P. M., & Chou, C.-P. (1992). Some new covariance structure model improvement statistics. Sociological Methods & Research, 21(2), 259–282. doi:10.1177/0049124192021002006
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1(1), 65–92. Retrieved from https://www.jstor.org/stable/24303994
Mansolf, M., Jorgensen, T. D., & Enders, C. K. (2020). A multiple imputation score test for model modification in structural equation models. Psychological Methods, 25(4), 393–411. doi:10.1037/met0000243
See Also
Examples
data(HS20imps) # import a list of 20 imputed data sets
## specify CFA model from lavaan's ?cfa help page
HS.model <- '
speed =~ c(L1, L1)*x7 + c(L1, L1)*x8 + c(L1, L1)*x9
'
fit <- cfa.mi(HS.model, data = HS20imps, group = "school", std.lv = TRUE)
## Mode 1: Score test for releasing equality constraints
## default test: Li et al.'s (1991) "D2" method
lavTestScore.mi(fit, cumulative = TRUE)
## Li et al.'s (1991) "D1" method,
## adapted for score tests by Mansolf et al. (2020)
lavTestScore.mi(fit, pool.method = "D1")
## Mode 2: Score test for adding currently fixed-to-zero parameters
lavTestScore.mi(fit, add = 'x7 ~~ x8 + x9')
Wald Test for Multiple Imputations
Description
Wald test for testing a linear hypothesis about the parameters of lavaan models fitted to multiple imputed data sets. Statistics for constraining one or more free parameters in a model can be calculated from the pooled point estimates and asymptotic covariance matrix of model parameters using Rubin's (1987) rules, or by pooling the Wald test statistics across imputed data sets (Li, Meng, Raghunathan, & Rubin, 1991).
Usage
lavTestWald.mi(
object,
constraints = NULL,
pool.method = c("D1", "D2"),
asymptotic = FALSE,
scale.W = !asymptotic,
omit.imps = c("no.conv", "no.se"),
verbose = FALSE,
warn = TRUE
)
Arguments
object |
An object of class lavaan.mi. |
constraints |
A |
pool.method |
|
asymptotic |
|
scale.W |
|
omit.imps |
|
verbose |
|
warn |
|
Details
The constraints are specified using the "=="
operator.
Both the left-hand side and the right-hand side of the equality can contain
a linear combination of model parameters, or a constant (like zero).
The model parameters must be specified by their user-specified labels from
the link[lavaan]{model.syntax}
. Names of defined parameters
(using the ":=" operator) can be included too.
Value
A vector containing the Wald test statistic (either an F
or
\chi^2
statistic, depending on the asymptotic
argument),
the degrees of freedom (numerator and denominator, if
asymptotic = FALSE
), and a p value. If
asymptotic = FALSE
, the relative invrease in variance (RIV, or
average for multiparameter tests: ARIV) used to calculate the denominator
df is also returned as a missing-data diagnostic, along with the
fraction missing information (FMI = ARIV / (1 + ARIV)).
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Based on source code for lavaan::lavTestWald()
by Yves Rosseel
References
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1(1), 65–92. Retrieved from https://www.jstor.org/stable/24303994
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley. doi:10.1002/9780470316696
See Also
Examples
data(HS20imps) # import a list of 20 imputed data sets
## specify CFA model from lavaan's ?cfa help page
HS.model <- '
visual =~ x1 + b1*x2 + x3
textual =~ x4 + b2*x5 + x6
speed =~ x7 + b3*x8 + x9
'
fit <- cfa.mi(HS.model, data = HS20imps)
## Testing whether a single parameter equals zero yields the 'chi-square'
## version of the Wald z statistic from the summary() output, or the
## 'F' version of the t statistic from the summary() output, depending
## whether asymptotic = TRUE or FALSE
lavTestWald.mi(fit, constraints = "b1 == 0") # default D1 statistic
lavTestWald.mi(fit, constraints = "b1 == 0", pool.method = "D2") # D2 statistic
## The real advantage is simultaneously testing several equality
## constraints, or testing more complex constraints:
con <- '
2*b1 == b3
b2 - b3 == 0
'
lavTestWald.mi(fit, constraints = con) # default F statistic
lavTestWald.mi(fit, constraints = con, asymptotic = TRUE) # chi-squared
Fit a lavaan Model to Multiple Imputed Data Sets
Description
This function fits a lavaan model to a list of imputed data sets.
Usage
lavaan.mi(model, data, ...)
cfa.mi(model, data, ...)
sem.mi(model, data, ...)
growth.mi(model, data, ...)
Arguments
model |
The analysis model can be specified using lavaan
|
data |
A a |
... |
additional arguments to pass to |
Value
A lavaan.mi object
Note
This functionality was originally provided via runMI()
in the
semTools
package, but there are differences. See the README file
on the GitHub page for this package (find link in DESCRIPTION).
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley. doi:10.1002/9780470316696
See Also
poolSat()
for a more efficient method to obtain SEM results
for multiple imputations
Examples
data(HS20imps) # import a list of 20 imputed data sets
## specify CFA model from lavaan's ?cfa help page
HS.model <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
## fit model to imputed data sets
fit <- cfa.mi(HS.model, data = HS20imps)
summary(fit, fit.measures = TRUE, fmi = TRUE)
summary(fit, standardized = "std.all", rsquare = TRUE)
## You can pass other lavaanList() arguments, such as FUN=, which allows
## you to save any custom output from each imputation's fitted model.
## An example with ordered-categorical data:
data(binHS5imps) # import a list of 5 imputed data sets
## Define a function to save a list with 2 custom outputs per imputation:
## - zero-cell tables
## - the obsolete "WRMR" fit index
myCustomFunc <- function(object) {
list(wrmr = lavaan::fitMeasures(object, "wrmr"),
zeroCells = lavaan::lavInspect(object, "zero.cell.tables"))
}
## fit the model
catout <- cfa.mi(HS.model, data = binHS5imps, ordered = TRUE,
FUN = myCustomFunc)
## pooled results
summary(catout)
## extract custom output (per imputation)
sapply(catout@funList, function(x) x$wrmr) # WRMR for each imputation
catout@funList[[1]]$zeroCells # zero-cell tables for first imputation
catout@funList[[2]]$zeroCells # zero-cell tables for second imputation ...
Class for a lavaan Model Fitted to Multiple Imputations
Description
This class extends the lavaan::lavaanList class, created by fitting a lavaan model to a list of data sets. In this case, the list of data sets are multiple imputations of missing data.
Usage
## S4 method for signature 'lavaan.mi'
show(object)
## S4 method for signature 'lavaan.mi'
summary(
object,
header = TRUE,
fit.measures = FALSE,
fm.args = list(standard.test = "default", scaled.test = "default", rmsea.ci.level =
0.9, rmsea.h0.closefit = 0.05, rmsea.h0.notclosefit = 0.08, robust = TRUE,
cat.check.pd = TRUE),
estimates = TRUE,
ci = FALSE,
standardized = FALSE,
std = standardized,
cov.std = TRUE,
rsquare = FALSE,
fmi = FALSE,
asymptotic = FALSE,
scale.W = !asymptotic,
omit.imps = c("no.conv", "no.se"),
remove.unused = TRUE,
modindices = FALSE,
nd = 3L,
...
)
## S4 method for signature 'lavaan.mi'
nobs(object, total = TRUE)
## S4 method for signature 'lavaan.mi'
coef(object, type = "free", labels = TRUE, omit.imps = c("no.conv", "no.se"))
## S4 method for signature 'lavaan.mi'
vcov(
object,
type = c("pooled", "between", "within", "ariv"),
scale.W = TRUE,
omit.imps = c("no.conv", "no.se")
)
## S4 method for signature 'lavaan.mi'
fitted(object, omit.imps = c("no.conv", "no.se"))
## S4 method for signature 'lavaan.mi'
fitted.values(object, omit.imps = c("no.conv", "no.se"))
## S4 method for signature 'lavaan.mi'
fitMeasures(
object,
fit.measures = "all",
baseline.model = NULL,
h1.model = NULL,
fm.args = list(standard.test = "default", scaled.test = "default", rmsea.ci.level =
0.9, rmsea.h0.closefit = 0.05, rmsea.h0.notclosefit = 0.08, robust = 0.08,
cat.check.pd = TRUE),
output = "vector",
omit.imps = c("no.conv", "no.se"),
...
)
## S4 method for signature 'lavaan.mi'
fitmeasures(
object,
fit.measures = "all",
baseline.model = NULL,
h1.model = NULL,
fm.args = list(standard.test = "default", scaled.test = "default", rmsea.ci.level =
0.9, rmsea.h0.closefit = 0.05, rmsea.h0.notclosefit = 0.08, robust = 0.08,
cat.check.pd = TRUE),
output = "vector",
omit.imps = c("no.conv", "no.se"),
...
)
Arguments
object |
An object of class lavaan.mi |
header , fit.measures , fm.args , estimates , ci , standardized , std , cov.std , rsquare , remove.unused , modindices , nd , output |
See descriptions of |
fmi |
|
asymptotic |
|
scale.W |
|
omit.imps |
|
... |
Additional arguments passed to |
total |
|
type |
The meaning of this argument varies depending on which method it
it used for. Find detailed descriptions in the Value section
under |
labels |
|
baseline.model , h1.model |
Value
coef |
|
vcov |
|
fitted.values |
|
fitted |
alias for |
nobs |
|
fitMeasures |
|
fitmeasures |
alias for |
show |
|
summary |
|
Slots
coefList
list
of estimated coefficients in matrix format (one per imputation) as output bylavInspect(fit, "est")
phiList
list
of model-implied latent-variable covariance matrices (one per imputation) as output bylavInspect(fit, "cov.lv")
miList
list
of modification indices output bylavaan::modindices()
lavListCall
call to
lavaan::lavaanList()
used to fit the model to the list of imputed data sets in@DataList
, stored as alist
of argumentsconvergence
list
oflogical
vectors indicating whether, for each imputed data set, (1) the model converged on a solution, (2) SEs could be calculated, (3) the (residual) covariance matrix of latent variables (\Psi
) is non-positive-definite, and (4) the residual covariance matrix of observed variables (\Theta
) is non-positive-definite.version
Named
character
vector indicating thelavaan
andlavaan.mi
version numbers.DataList
The
list
of imputed data setsSampleStatsList
List of output from
lavInspect(fit, "sampstat")
applied to each fitted model.ParTableList,vcovList,testList,baselineList
h1List
See lavaan::lavaanList. An additional element is added to the
list
:$PT
is the "saturated" model's parameter table, returned bylavaan::lav_partable_unrestricted()
.call,Options,ParTable,pta,Data,Model,meta,timingList,CacheList,optimList,impliedList,loglikList,internalList,funList,external
By default,
lavaan.mi()
does not populate the remaining@*List
slots from the lavaan::lavaanList class. But they can be added to the call using thestore.slots=
argument (passed tolavaan::lavaanList()
via ...).
Objects from the Class
See the lavaan.mi()
function
for details. Wrapper functions include cfa.mi()
,
sem.mi()
, and growth.mi()
.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley. doi:10.1002/9780470316696
Examples
data(HS20imps) # import a list of 20 imputed data sets
## specify CFA model from lavaan's ?cfa help page
HS.model <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
## fit model to imputed data sets
fit <- cfa.mi(HS.model, data = HS20imps)
## vector of pooled coefficients
coef(fit)
## their pooled asymptotic covariance matrix
vcov(fit)
## which is the weighted sum of within- and between-imputation components
vcov(fit, type = "within")
vcov(fit, type = "between")
## covariance matrix of observed variables,
## as implied by pooled estimates
fitted(fit)
## custom null model for CFI
HS.parallel <- '
visual =~ x1 + 1*x2 + 1*x3
textual =~ x4 + 1*x5 + 1*x6
speed =~ x7 + 1*x8 + 1*x9
'
fit0 <- cfa.mi(HS.parallel, data = HS20imps, orthogonal = TRUE)
fitMeasures(fit, baseline.model = fit0, fit.measures = "default",
output = "text")
## See ?lavaan.mi help page for more examples
Modification Indices for Multiple Imputations
Description
Modification indices (1-df Lagrange multiplier tests) from a latent variable model fitted to multiple imputed data sets. Statistics for releasing one or more fixed or constrained parameters in model can be calculated by pooling the gradient and information matrices across imputed data sets in a method proposed by Mansolf, Jorgensen, & Enders (2020)—analogous to the "D1" Wald test proposed by Li, Meng, Raghunathan, & Rubin (1991)—or by pooling the complete-data score-test statistics across imputed data sets (i.e., "D2"; Li et al., 1991).
Usage
modindices.mi(
object,
pool.method = c("D2", "D1"),
omit.imps = c("no.conv", "no.se"),
standardized = TRUE,
cov.std = TRUE,
information = "expected",
power = FALSE,
delta = 0.1,
alpha = 0.05,
high.power = 0.75,
sort. = FALSE,
minimum.value = 0,
maximum.number = nrow(LIST),
na.remove = TRUE,
op = NULL
)
modificationIndices.mi(
object,
pool.method = c("D2", "D1"),
omit.imps = c("no.conv", "no.se"),
standardized = TRUE,
cov.std = TRUE,
information = "expected",
power = FALSE,
delta = 0.1,
alpha = 0.05,
high.power = 0.75,
sort. = FALSE,
minimum.value = 0,
maximum.number = nrow(LIST),
na.remove = TRUE,
op = NULL
)
modificationindices.mi(
object,
pool.method = c("D2", "D1"),
omit.imps = c("no.conv", "no.se"),
standardized = TRUE,
cov.std = TRUE,
information = "expected",
power = FALSE,
delta = 0.1,
alpha = 0.05,
high.power = 0.75,
sort. = FALSE,
minimum.value = 0,
maximum.number = nrow(LIST),
na.remove = TRUE,
op = NULL
)
Arguments
object |
An object of class lavaan.mi |
pool.method |
|
omit.imps |
|
standardized |
|
cov.std |
|
information |
|
power |
|
delta |
The value of the effect size, as used in the post-hoc power
computation, currently using the unstandardized metric of the |
alpha |
The significance level used for deciding if the modification index is statistically significant or not. |
high.power |
If the computed power is higher than this cutoff value,
the power is considered 'high'. If not, the power is considered 'low'.
This affects the values in the |
sort. |
|
minimum.value |
|
maximum.number |
|
na.remove |
|
op |
|
Value
A data.frame
containing modification indices and (S)EPCs.
Note
When pool.method = "D2"
, each (S)EPC will be pooled by taking its
average across imputations. When pool.method = "D1"
, EPCs will be
calculated in the standard way using the pooled gradient and information,
and SEPCs will be calculated by standardizing the EPCs using model-implied
(residual) variances.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Based on source code for lavaan::modindices()
by Yves Rosseel
pool.method = "D1"
method proposed by
Maxwell Mansolf (University of California, Los Angeles;
mamansolf@gmail.com)
References
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data.Statistica Sinica, 1(1), 65–92. Retrieved from https://www.jstor.org/stable/24303994
Mansolf, M., Jorgensen, T. D., & Enders, C. K. (2020). A multiple imputation score test for model modification in structural equation models. Psychological Methods, 25(4), 393–411. doi:10.1037/met0000243
See Also
Examples
data(HS20imps) # import a list of 20 imputed data sets
## specify CFA model from lavaan's ?cfa help page
HS.model <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
fit <- cfa.mi(HS.model, data = HS20imps)
modindices.mi(fit) # default: Li et al.'s (1991) "D2" method
## Li et al.'s (1991) "D1" method,
## adapted for score tests by Mansolf et al. (2020)
modindices.mi(fit, pool.method = "D1")
Pooled Parameter Estimates
Description
This function pools parameter estimates from a lavaan model fitted to multiple imputed data sets.
Usage
parameterEstimates.mi(
object,
se = TRUE,
zstat = se,
pvalue = zstat,
ci = TRUE,
level = 0.95,
fmi = FALSE,
standardized = FALSE,
cov.std = TRUE,
rsquare = FALSE,
asymptotic = FALSE,
scale.W = !asymptotic,
omit.imps = c("no.conv", "no.se"),
remove.system.eq = TRUE,
remove.eq = TRUE,
remove.ineq = TRUE,
remove.def = FALSE,
remove.nonfree = FALSE,
remove.unused = FALSE,
output = "data.frame",
header = FALSE
)
Arguments
object |
An object of class |
se , zstat , pvalue , ci , level , standardized , cov.std , rsquare , remove.system.eq , remove.eq , remove.ineq , remove.def , remove.nonfree , remove.unused , output , header |
|
fmi |
Thus, RIV = FMI / (1 |
asymptotic |
|
scale.W |
|
omit.imps |
|
Value
A data.frame
, analogous to lavaan::parameterEstimates()
, but estimates,
SEs, and tests are pooled across imputations.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley. doi:10.1002/9780470316696
See Also
standardizedSolution.mi()
to obtain inferential statistics for pooled
standardized parameter estimates.
Examples
data(HS20imps) # import a list of 20 imputed data sets
## specify CFA model from lavaan's ?cfa help page
HS.model <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
## fit model to 20 imputed data sets
fit <- cfa.mi(HS.model, data = HS20imps)
## pooled estimates, with various optional features:
parameterEstimates.mi(fit, asymptotic = TRUE, rsquare = TRUE)
parameterEstimates.mi(fit, ci = FALSE, fmi = TRUE, output = "text")
parameterEstimates.mi(fit, standardized = "std.all", se = FALSE)
Fit a Saturated lavaan
Model to Multiple Imputed Data Sets
Description
This function fits a saturated model to a list of imputed data sets, and returns a list of pooled summary statistics to treat as data.
Usage
poolSat(
data,
...,
return.fit = FALSE,
scale.W = TRUE,
omit.imps = c("no.conv", "no.se")
)
Arguments
data |
A |
... |
Additional arguments passed to |
return.fit |
|
scale.W |
|
omit.imps |
|
Value
If return.fit=TRUE
, a lavaan.mi object.
Otherwise, an object of class lavMoments
, which is a list
that contains at least $sample.cov
and $sample.nobs
,
potentially also $sample.mean
, $sample.th
, $NACOV
,
and $WLS.V
. Also contains $lavOptions
that will be passed
to lavaan(...)
.
Note
The $lavOptions
list will always set fixed.x=FALSE
and
conditional.x=FALSE
. Users should not override those options when
calling lavaan::lavaan()
because doing so would yield
incorrect SEs and test statistics. Computing the correct
$NACOV
argument would depend on which specific variables are
treated as fixed, which would require an argument to poolSat()
for
users to declare names of exogenous variables. This has not yet been
programmed, but that feature may be added in the future in order to reduce
the number of parameters to estimate.
However, if "exogenous" predictors were incomplete and imputed, then they
are not truly fixed (i.e., unvarying across samples), so treating them as
fixed would be illogical and yield biased SEs and test statistics.
The information returned by poolSat()
must assume that any fitted
SEM will include all the variables in $sample.cov
and (more
importantly) in $NACOV
. Although lavaan
can drop unused
rows/columns from $sample.cov
, it cannot be expected to drop the
corresponding sampling variances of those eliminated (co)variances from
$NACOV
. Thus, it is necessary to use poolSat()
to obtain
the appropriate summary statistics for any particular SEM (see Examples).
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Lee, T., & Cai, L. (2012). Alternative multiple imputation inference for mean and covariance structure modeling. Journal of Educational and Behavioral Statistics, 37(6), 675–702. doi:10.3102/1076998612458320
Chung, S., & Cai, L. (2019). Alternative multiple imputation inference for categorical structural equation modeling, Multivariate Behavioral Research, 54(3), 323–337. doi:10.1080/00273171.2018.1523000
See Also
lavaan.mi()
for traditional method (fit SEM to each imputation,
pool results afterward).
Examples
data(HS20imps) # import a list of 20 imputed data sets
## fit saturated model to imputations, pool those results
impSubset1 <- lapply(HS20imps, "[", i = paste0("x", 1:9)) # only modeled variables
(prePooledData <- poolSat(impSubset1))
## Note: no means were returned (default lavOption() is meanstructure=FALSE)
(prePooledData <- poolSat(impSubset1, meanstructure = TRUE))
## specify CFA model from lavaan's ?cfa help page
HS.model <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
## fit model to summary statistics in "prePooledData"
fit <- cfa(HS.model, data = prePooledData, std.lv = TRUE)
## By default, the "Scaled" column provides a "scaled.shifted" test
## statistic that maintains an approximately nominal Type I error rate.
summary(fit, fit.measures = TRUE, standardized = "std.all")
## Note that this scaled statistic does NOT account for deviations from
## normality, because the default normal-theory standard errors were
## requested when running poolSat(). See below about non-normality.
## Alternatively, "Browne's residual-based (ADF) test" is also available:
lavTest(fit, test = "browne.residual.adf", output = "text")
## Optionally, save the saturated-model lavaan.mi object, which
## could be helpful for diagnosing convergence problems per imputation.
satFit <- poolSat(impSubset1, return.fit = TRUE)
## FITTING MODELS TO DIFFERENT (SUBSETS OF) VARIABLES
## If you only want to analyze a subset of these variables,
mod.vis <- 'visual =~ x1 + x2 + x3'
## you will get an error:
try(
fit.vis <- cfa(mod.vis, data = prePooledData) # error
)
## As explained in the "Note" section, you must use poolSat() again for
## this subset of variables
impSubset3 <- lapply(HS20imps, "[", i = paste0("x", 1:3)) # only modeled variables
visData <- poolSat(impSubset3)
fit.vis <- cfa(mod.vis, data = visData) # no problem
## OTHER lavaan OPTIONS
## fit saturated MULIPLE-GROUP model to imputations
impSubset2 <- lapply(HS20imps, "[", i = c(paste0("x", 1:9), "school"))
(prePooledData2 <- poolSat(impSubset2, group = "school",
## request standard errors that are ROBUST
## to violations of the normality assumption:
se = "robust.sem"))
## Nonnormality-robust standard errors are implicitly incorporated into the
## pooled weight matrix (NACOV= argument), so they are
## AUTOMATICALLY applied when fitting the model:
fit.config <- cfa(HS.model, data = prePooledData2, group = "school",
std.lv = TRUE)
## standard errors and chi-squared test of fit both robust to nonnormality
summary(fit.config)
## CATEGORICAL OUTCOMES
## discretize the imputed data, for an example of 3-category data
HS3cat <- lapply(impSubset1, function(x) {
as.data.frame( lapply(x, cut, breaks = 3, labels = FALSE) )
})
## pool polychoric correlations and thresholds
(prePooledData3 <- poolSat(HS3cat, ordered = paste0("x", 1:9)))
fitc <- cfa(HS.model, data = prePooledData3, std.lv = TRUE)
summary(fitc)
## Optionally, use unweighted least-squares estimation. However,
## you must first REMOVE the pooled weight matrix (WLS.V= argument)
## or replace it with an identity matrix of the same dimensions:
prePooledData4 <- prePooledData3
prePooledData4$WLS.V <- NULL
## or prePooledData4$WLS.V <- diag(nrow(prePooledData3$WLS.V))
fitcu <- cfa(HS.model, data = prePooledData4, std.lv = TRUE, estimator = "ULS")
## Note that the SEs and test were still appropriately corrected:
summary(fitcu)
Standardized Pooled Parameter Estimates
Description
This function calculates pooled parameter estimates from a lavaan model fitted to multiple imputed data sets, then transforms the pooled estimates and their SEs using the delta method.
Usage
standardizedSolution.mi(
object,
return.vcov = FALSE,
omit.imps = c("no.conv", "no.se"),
...
)
Arguments
object |
An object of class |
return.vcov |
|
omit.imps |
|
... |
Arguments passed to |
Value
A data.frame
containing standardized model parameters, analogous to
lavaan::standardizedSolution()
. Delta-method SEs and CIs rely on
asymptotic theory, so only Wald z tests are available, analogous to
setting parameterEstimates.mi(fit, asymptotic = TRUE)
.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
See Also
parameterEstimates.mi()
for pooling unstandardized parameter estimates,
which can also add standardized point estimates to indicate effect size.
Examples
data(HS20imps) # import a list of 20 imputed data sets
## specify CFA model from lavaan's ?cfa help page
HS.model <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
## fit model to 20 imputed data sets
fit <- cfa.mi(HS.model, data = HS20imps)
standardizedSolution.mi(fit) # default: type = "std.all"
## only standardize latent variables:
standardizedSolution.mi(fit, type = "std.lv",
output = "text") # display like summary()