--- title: "Simulation of confidence intervals." output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Simulation of confidence intervals.} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Setting up the simulations We use the dataset `bfi` from the package `psych` together with `lavaan` to estimate some realistic factor loadings $\lambda$ and standard deviations $\sigma$. ```r model <- c("y =~ A1 + A2 + A3 + A4 + A5") fit <- lavaan::cfa(model, data = psych::bfi) coefs <- lavaan::lavInspect(fit, what = "x") lambda <- abs(c(coefs$lambda * sqrt(as.numeric(coefs$psi)))) sigma <- sqrt(diag(lavaan::lavInspect(fit, what = "x")$theta)) ``` We take the absolute value of the `lambda` vector as the agreement data contains reverse-coded items. ## Comparing confidence intervals coverages and lengths We compare five confidence intervals, all without transformations. The `adf` interval is the asymptotic distribution-free interval, the `ell` interval is the interval based on elliptical distributions and a kurtosis correction, the `ell_par` is the elliptical interval assuming a parallel model. The same comments hold for `norm` (assuming normal data) and `norm_par` (assuming parallel normal data). ```r library("alphaci") library("future.apply") plan(multisession, workers = availableCores() - 2) set.seed(313) n_reps <- 10000 k <- 5 latent <- \(n) extraDistr::rlaplace(n) / sqrt(2) true <- alphaci:::alpha(sigma, lambda) ``` In this simulation we normal error terms and a Laplace-distributed latent variable. This one has excess kurtosis $3$, which caries over in large part to the data. `k` is the number of questions ands `n_reps` is the number of simulations. ```r success <- \(ci) true <= ci[2] & true >= ci[1] len <- \(ci) ci[2] - ci[1] simulations <- \(n) { results <- future.apply::future_replicate(n_reps, { x <- alphaci:::simulate_congeneric(n, k, sigma, lambda, latent = latent) cis <- rbind(adf = alphaci(x, type = "adf"), adf_par = alphaci(x, type = "adf", parallel = TRUE), ell = alphaci(x, type = "elliptical"), ell_par = alphaci(x, type = "elliptical", parallel = TRUE), norm = alphaci(x, type = "normal"), norm_par = alphaci(x, type = "normal", parallel = TRUE) ) c(cov = apply(cis, 1, success), len = apply(cis, 1, len)) }, future.seed = TRUE) rowMeans(results) } ``` Let's check out the results when $n= 10$. ```r simulations(10) #> cov.adf cov.adf_par cov.ell cov.ell_par cov.norm cov.norm_par len.adf len.adf_par #> 0.8745000 0.7942000 0.9513000 0.9566000 0.9223000 0.9297000 0.7680810 55.3239940 #> len.ell len.ell_par len.norm len.norm_par #> 1.0238478 1.0499270 0.9113473 0.9342908 ``` It appears that the kurtosis corrections work well, at least for small sample size. Let's see how they perform when $n$ increases. ```r nn <- c(5, 10, 20, 30, 40, 50, 100, 200, 500, 1000, 2000, 5000) results <- sapply(nn, simulations) ``` Plotting the coverages, we find, where `1` is asymptotically distribution-free, `2` is elliptical, `3` is paralell and elliptical, `4` is `normal` and `5` is parallel and normal. ```r matplot(nn, t(results[1:5, ]), xlab = "n", ylab = "Coverage", type = "b", log = "x") abline(h = 0.95, lty = 2) ``` plot of chunk figure Hence the kurtosis correction intervals have better coverage than the `adf` interval when $n\leq 50$ and outperforms the normal theory intervals for all $n$. If this observation is general remains to be seen.