Title: | A Tidy Interface for Simulating Multivariate Data |
Version: | 0.2.2 |
Description: | Provides pipe-friendly (%>%) wrapper functions for MASS::mvrnorm() to create simulated multivariate data sets with groups of variables with different degrees of variance, covariance, and effect size. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
Imports: | dplyr, tibble, MASS, purrr, rlang, assertthat |
RoxygenNote: | 7.2.3 |
URL: | https://github.com/Aariq/holodeck |
BugReports: | https://github.com/Aariq/holodeck/issues |
Suggests: | testthat, covr, knitr, rmarkdown, mice, ggplot2 |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2023-08-25 21:43:57 UTC; ericscott |
Author: | Eric Scott |
Maintainer: | Eric Scott <scottericr@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2023-08-25 22:00:06 UTC |
holodeck: A Tidy Interface for Simulating Multivariate Data
Description
Provides pipe-friendly (
Author(s)
Maintainer: Eric Scott scottericr@gmail.com (ORCID)
See Also
Useful links:
Definition operator
Description
Internally, this package uses the definition operator, :=
,
to make assignments that require computing on the LHS.
Arguments
x |
An object to test. |
lhs , rhs |
Expressions for the LHS and RHS of the definition. |
Pipe friendly wrapper to 'diag(x) <- value'
Description
Pipe friendly wrapper to 'diag(x) <- value'
Usage
set_diag(x, value)
Arguments
x |
a matrix |
value |
either a single value or a vector of length equal to the diagonal of 'x'. |
Value
a matrix
Examples
library(dplyr)
matrix(0,3,3) %>%
set_diag(1)
Simulate categorical data
Description
This is a simple wrapper that creates a tibble of length 'n_obs' with a single column 'groups'. It will warn if there are fewer than three replicates per group.
Usage
sim_cat(.data = NULL, n_obs = NULL, n_groups, name = "group")
Arguments
.data |
An optional dataframe. If a dataframe is supplied, simulated categorical data will be added to the dataframe. Either '.data' or 'n_obs' must be supplied. |
n_obs |
Total number of observations/rows to simulate if '.data' is not supplied. |
n_groups |
How many groups or treatments to simulate. |
name |
The column name for the grouping variable. Defaults to "group". |
Details
To-do:
- Make this optionally create multiple categorical variables as being nested or crossed or random
Value
a tibble
See Also
Other multivariate normal functions:
sim_covar()
,
sim_discr()
Examples
df <- sim_cat(n_obs = 30, n_groups = 3)
Simulate co-varying variables
Description
Adds a group of variables (columns) with a given variance and covariance to a data frame or tibble
Usage
sim_covar(.data = NULL, n_obs = NULL, n_vars, var, cov, name = NA, seed = NA)
Arguments
.data |
An optional dataframe. If a dataframe is supplied, simulated categorical data will be added to the dataframe. Either '.data' or 'n_obs' must be supplied. |
n_obs |
Total number of observations/rows to simulate if '.data' is not supplied. |
n_vars |
Number of variables to simulate. |
var |
Variance used to construct variance-covariance matrix. |
cov |
Covariance used to construct variance-covariance matrix. |
name |
An optional name to be appended to the column names in the output. |
seed |
An optional seed for random number generation. If 'NA' (default) a random seed will be used. |
Value
a tibble
See Also
Other multivariate normal functions:
sim_cat()
,
sim_discr()
Examples
library(dplyr)
sim_cat(n_obs = 30, n_groups = 3) %>%
sim_covar(n_vars = 5, var = 1, cov = 0.5, name = "correlated")
Simulate co-varying variables with different means by group
Description
To-do: make this work with 'dplyr::group_by()' instead of 'group ='
Usage
sim_discr(.data, n_vars, var, cov, group_means, name = NA, seed = NA)
Arguments
.data |
A dataframe containing a grouping variable column. |
n_vars |
Number of variables to simulate. |
var |
Variance used to construct variance-covariance matrix. |
cov |
Covariance used to construct variance-covariance matrix. |
group_means |
A vector of the same length as the number of grouping variables. |
name |
An optional name to be appended to the column names in the output. |
seed |
An optional seed for random number generation. If 'NA' (default) a random seed will be used. |
Value
a tibble
See Also
Other multivariate normal functions:
sim_cat()
,
sim_covar()
Examples
library(dplyr)
sim_cat(n_obs = 30, n_groups = 3) %>%
group_by(group) %>%
sim_discr(n_vars = 5, var = 1, cov = 0.5, group_means = c(-1, 0, 1), name = "descr")
Simulate missing values
Description
Takes a data frame and randomly replaces a user-supplied proportion of values with 'NA'.
Usage
sim_missing(.data, prop, seed = NA)
Arguments
.data |
A dataframe. |
prop |
Proportion of values to be set to 'NA'. |
seed |
An optional seed for random number generation. If 'NA' (default) a random seed will be used. |
Value
a dataframe with NAs
Examples
library(dplyr)
df <- sim_cat(n_obs = 10, n_groups = 2) %>%
sim_covar(n_vars = 10, var = 1, cov = 0.5) %>%
sim_missing(0.05)