Title: Design of Experiments Suite: Generate and Evaluate Optimal Designs
Date: 2025-04-07
Version: 1.8.2
Description: Generates and evaluates D, I, A, Alias, E, T, and G optimal designs. Supports generation and evaluation of blocked and split/split-split/.../N-split plot designs. Includes parametric and Monte Carlo power evaluation functions, and supports calculating power for censored responses. Provides a framework to evaluate power using functions provided in other packages or written by the user. Includes a Shiny graphical user interface that displays the underlying code used to create and evaluate the design to improve ease-of-use and make analyses more reproducible. For details, see Morgan-Wall et al. (2021) <doi:10.18637/jss.v099.i01>.
Copyright: Institute for Defense Analyses
Depends: R (≥ 4.1.0)
License: GPL-3
RoxygenNote: 7.3.2
Imports: utils, iterators, stats, lme4, Rcpp (≥ 0.11.0), foreach, doParallel, survival, future, car, viridis, magrittr, lmerTest, methods, progress, doRNG, doFuture, progressr, geometry, digest
LinkingTo: Rcpp, RcppEigen
Suggests: testthat, mbest, ggplot2, lmtest, cli, gridExtra, rintrojs, shinythemes, shiny, shinyjs, gt, shinytest2
Encoding: UTF-8
URL: https://github.com/tylermorganwall/skpr, https://tylermorganwall.github.io/skpr/
BugReports: https://github.com/tylermorganwall/skpr/issues
NeedsCompilation: yes
Packaged: 2025-04-07 19:25:22 UTC; tmorganw
Author: Tyler Morgan-Wall [aut, cre], George Khoury [aut]
Maintainer: Tyler Morgan-Wall <tylermw@gmail.com>
Repository: CRAN
Date/Publication: 2025-04-25 18:40:02 UTC

re-export magrittr pipe operator

Description

re-export magrittr pipe operator


Calculate CI bounds on Monte Carlo

Description

Calculates CI bounds for Monte Carlo power

Usage

add_ci_bounds_mc_power(power_results, nsim, conf = 0.95)

Value

Power data.frame with conf intervals


Alias terms

Description

Creates alias terms for Alias-optimal designs

Usage

aliasmodel(formula, power)

Arguments

formula

The formula to be expanded

Value

Returns aliased model terms from formula


Generates Anticipated Coefficients from delta

Description

Generates Anticipated Coefficients from delta parameter The logic for generating anticipated coefficients from delta varies with glm family.

Usage

anticoef_from_delta(default_coef, delta, glmfamily)

Arguments

default_coef

a vector of default coefficients, from gen_anticoef

delta

the user-input delta parameter, must be a numeric vector of length 1 or 2

glmfamily

the user-supplied glmfamily, either a string or a glmfamily object

Value

Anticipated coefficients.


Generates Anticipated Coefficients from delta for eval_design_suvival_mc

Description

Generates Anticipated Coefficients from delta parameter The logic for generating anticipated coefficients from delta varies with glm family.

Usage

anticoef_from_delta_surv(default_coef, delta, distribution)

Arguments

default_coef

a vector of default coefficients, from gen_anticoef

delta

the user-input delta parameter, must be a numeric vector of length 1 or 2

distribution

the user-supplied distribution, either a string or a survreg distribution object

Value

Anticipated coefficients.


Find block sizes in column

Description

Returns blocks

Usage

blockingstructure(blocklist)

Arguments

blocklist

Block list

Value

Returns the blocking structure given a vector of block ids


Calculate Conservative Anticipated Coefficients

Description

Calculate Conservative Anticipated Coefficients

Usage

calc_conservative_anticoef(results, effectsize)

Arguments

results

The power results matrix

effectsize

The effectsize.

Value

Vector of conservative anticipated coefficients


Calculate Interaction Degrees of Freedom

Description

Calculate interaction degrees of freedom for split-plot designs

Usage

calc_interaction_degrees(
  design,
  model,
  contrast,
  nointercept,
  split_layers,
  split_degrees,
  split_plot_structure
)

Arguments

design

The design matrix

model

The model

contrast

The contrast

split_layers

The layer of split plots for each main effect term

split_degrees

The number of degrees of freedom for each main effect term

Value

Degrees of freedom vector


Calculate block sizes

Description

Calculate block size vector

Usage

calcblocksizes(trials, blocksize)

Arguments

trials

Number of trials in design

blocksize

The desired size of each block

Value

The blocksize vector


Calculate Non-Centrality Parameter

Description

Calculates the non-centrality parameter for the model matrix

Usage

calcnoncentralparam(X, L, b, vinv = NULL)

Arguments

X

The model matrix

L

The parameter matrix/vector

b

The anticipated coefficients

vinv

variance-covariance matrix

Value

The non-centrality parameter for the F test


Determine Nesting Level of Blocks

Description

Calculates if a block is fully nested within another block, and what the highest level of nesting is. 1 indicates the block isn't nested within another block–just within the intercept.

Usage

calculate_block_nesting(blockgroups, blockstructure)

Value

Vector of numbers indicating the hierarchy


Calculate Degrees of Freedom

Description

Calculates the degrees of freedom adjustment for models with random effects from Penheiro and Bates, pg. 91

Usage

calculate_degrees_of_freedom(
  run_matrix_processed,
  nointercept,
  model,
  contrasts
)

Arguments

run_matrix_processed

The design

nointercept

Whether the intercept term is present

Value

Vector of numbers indicating split plot layers


Calculate G Efficiency

Description

Either calculates G-Efficiency by Monte Carlo sampling from the design space (ignoring constraints), searching for the maximum point (slower but higher quality), or by using a user-specified candidate set for the design space (fastest).

Usage

calculate_gefficiency(
  design,
  calculation_type = "random",
  randsearches = 10000,
  design_space_mm = NULL
)

Value

Normalized run matrix


Calculate level vector

Description

Calculate level vector

Usage

calculate_level_vector(design, model, nointercept)

Arguments

design

Design to be analyzed.

model

The model.

nointercept

Whether an intercept is included.

Value

A vector of levels.

Examples

#We can pass either the output of gen_design or eval_design to plot_correlations

Calculate Power Curves

Description

Calculate and optionally plot power curves for different effect sizes and trial counts. This function takes a

Usage

calculate_power_curves(
  trials,
  effectsize = 1,
  candidateset = NULL,
  model = NULL,
  alpha = 0.05,
  gen_args = list(),
  eval_function = "eval_design",
  eval_args = list(),
  random_seed = 123,
  iterate_seed = FALSE,
  plot_results = TRUE,
  auto_scale = TRUE,
  x_breaks = NULL,
  y_breaks = seq(0, 1, by = 0.1),
  ggplot_elements = list()
)

Arguments

trials

A numeric vector indicating the trial(s) used when computing the power curve. If a single value, this will be fixed and only 'effectsize' will be varied.

effectsize

Default '1'. A numeric vector indicating the effect size(s) used when computing the power curve. If a single value, this will be fixed and only 'trials' will be varied. If using a length-2 effect size with 'eval_design_mc()' (such as a binomial probability interval), the effect size pairs can be input as entries in a list.

candidateset

Default 'NULL'. The candidate set (see 'gen_design()' documentation for more information). Provided to aid code completion: can also be provided in 'gen_args'.

model

Default 'NULL'. The model (see 'gen_design()' and 'eval_design()' documentation for more information). Provided to aid code completion: can also be provided in 'gen_args'/'eval_args'.

alpha

Default '0.05'. The allowable Type-I error rate (see 'eval_design()' documentation for more information). Provided to aid code completion: can also be provided in 'eval_args'.

gen_args

Default 'list()'. A list of argument/value pairs to specify the design generation parameters for 'gen_design()'.

eval_function

Default '"eval_design"'. A string (or function) specifying the skpr power evaluation function. Can also be '"eval_design_mc"', '"eval_design_survival_mc"', and '"eval_design_custom_mc"'.

eval_args

Default 'list()'. A list of argument/value pairs to specify the design power evaluation parameters for 'eval_function'.

random_seed

Default '123'. The random seed used to generate and then evaluate the design. The seed is set right before design generation.

iterate_seed

Default 'FALSE'. This will iterate the random seed with each new design. Set this to 'TRUE' to add more variability to the design generation process.

plot_results

Default 'TRUE'. Whether to print out a plot of the power curves in addition to the data frame of results. Requires 'ggplot2'.

auto_scale

Default 'TRUE'. Whether to automatically scale the y-axis to 0 and 1.

x_breaks

Default 'NULL', automaticly generated by ggplot2.

y_breaks

Default 'seq(0,1,by=0.1)'. Y-axis breaks.

ggplot_elements

Default 'list()'. Extra 'ggplot2' elements to customize the plot, passed in as elements in a list.

Value

A data.frame of power values with design generation information.

Examples

if(skpr:::run_documentation()) {
cand_set = expand.grid(brew_temp = c(80, 85, 90),
                      altitude = c(0, 2000, 4000),
                      bean_sun = c("low", "partial", "high"))
#Plot power for a linear model with all interactions
calculate_power_curves(trials=seq(10,60,by=5),
                      candidateset = cand_set,
                      model = ~.*.,
                      alpha = 0.05,
                      effectsize = 1,
                      eval_function = "eval_design") |>
 head(30)

}
if(skpr:::run_documentation()) {
#Add multiple effect sizes
calculate_power_curves(trials=seq(10,60,by=1),
                      candidateset = cand_set,
                      model = ~.*.,
                      alpha = 0.05,
                      effectsize = c(1,2),
                      eval_function = "eval_design") |>
 head(30)
}
if(skpr:::run_documentation()) {
#Generate power curve for a binomial model
calculate_power_curves(trials=seq(50,150,by=10),
                      candidateset = cand_set,
                      model = ~.,
                      effectsize = c(0.6,0.9),
                      eval_function = "eval_design_mc",
                      eval_args = list(nsim = 100, glmfamily = "binomial")) |>
 head(30)
}
if(skpr:::run_documentation()) {
#Generate power curve for a binomial model and multiple effect sizes
calculate_power_curves(trials=seq(50,150,by=10),
                      candidateset = cand_set,
                      model = ~.,
                      effectsize = list(c(0.5,0.9),c(0.6,0.9)),
                      eval_function = "eval_design_mc",
                      eval_args = list(nsim = 100, glmfamily = "binomial")) |>
 head(30)
}

Calculate Split Plot Columns

Description

Calculates which columns correspond to which layers of randomization restriction

Usage

calculate_split_columns(run_matrix_processed, blockgroups, blockstructure)

Arguments

run_matrix_processed

The design

Value

Vector of numbers indicating split plot layers


Calculate V from Blocks

Description

Calculates the V matrix from the block structure

Usage

calculate_v_from_blocks(
  nrow_design,
  blockgroups,
  blockstructure,
  varianceratios
)

Arguments

nrow_design

The number of runs in the design

blockgroups

List indicating the size of each block for each layer

blockstructure

Matrix indicating the block structure from the rownames

Value

Variance-Covariance Matrix


Calculate Power

Description

Calculates the power of the model given the non-centrality parameter

Usage

calculatepower(X, L, lambda, alpha, degrees_of_freedom)

Arguments

X

The model matrix

L

The parameter matrix/vector

lambda

The non-centrality parameter for the F test

alpha

the specified type-I error

degrees_of_freedom

The number of degrees of freedom.

Value

The power for a given parameter L, given the


check_for_suggest_packages

Description

checks for suggests

Usage

check_for_suggest_packages(packages)

Value

none


Check Model Formula Validity

Description

Check Model Formula Validity

Usage

check_model_validity(model)

Construct Run Matrix given rows

Description

Returns number of levels prior to each parameter

Usage

constructRunMatrix(rowIndices, candidateList, augment = NULL)

Value

Returns a vector consisting of the number of levels preceeding each parameter (including the intercept)


Orthonormal Contrast Generator

Description

Generates orthonormal (orthogonal and normalized) contrasts. Each row is the vertex of an N-dimensional simplex. The only exception are contrasts for the 2-level case, which return 1 and -1.

Usage

contr.simplex(n, size = NULL)

Arguments

n

The number of levels in the catagorical variable. If this is a factor or character vector, 'n' will be 'length(n)'

size

Default '1'. The length of the simplex vector.

Value

A matrix of Orthonormal contrasts.

Examples

contr.simplex(4)

Convert Block Column to Rownames

Description

Detect externally generated blocking columns and convert to rownames

Usage

convert_blockcolumn_rownames(
  RunMatrix,
  blocking,
  varianceratios,
  verbose = FALSE
)

Arguments

RunMatrix

The run matrix

blocking

Whether random effects should be included.

varianceratios

A vector of variance ratios for each level of restricted randomization

Value

Row-name encoded blocked run matrix


Converts dot operator to terms

Description

Converts the dot operator '.' in a formula to the linear terms in the model. Includes interactions (e.g. .*.)

Usage

convert_model_dots(design, model, splitplotdesign = NULL)

Arguments

design

The design

model

Base model

splitplotdesign

split plot design data.frame

Value

New model with dot operator replaced


Converts dot operator to terms

Description

Converts the row names to a variance-covariance matrix (using the user-supplied variance ratio)

Usage

convert_rownames_to_covariance(
  run_matrix_processed,
  varianceratios,
  user_specified_varianceratio
)

Arguments

run_matrix_processed

The design

Value

Variance-covariance matrix V


Compute convex hull in half-space form

Description

Compute convex hull in half-space form

Usage

convhull_halfspace(points)

Arguments

points

A matrix or data frame of numeric coordinates, size n x d

Value

A list with elements: - A: a matrix of size m x d (m = number of facets) - b: a length-m vector so that x is inside the hull if and only if A - volume: Total volume of the convex hull


Detects disallowed combinations in candidate set

Description

Detects disallowed combinations in candidate set

Usage

detect_disallowed_combinations(candidateset)

Arguments

candidateset

Data frame

Value

Returns a logical


Detect and list disallowed combinations in candidate set

Description

Detects and list disallowed combinations in candidate set

Usage

disallowed_combinations(candidateset)

Arguments

candidateset

Block list

Value

Returns a list of the disallowed combinations


Calculate Effect Power

Description

Calculates the effect power given the anticipated coefficients and the type-I error

Usage

effectpower(
  RunMatrix,
  levelvector,
  anticoef,
  alpha,
  vinv = NULL,
  degrees = NULL
)

Arguments

RunMatrix

The model matrix

levelvector

The number of levels in each parameter (1st is always the intercept)

anticoef

The anticipated coefficients

alpha

the specified type-I error

vinv

The V inverse matrix

degrees

Degrees of freedom

Value

The effect power for the parameters


Fit Anova for Effect Power Calculation in Monte Carlo

Description

Calculates the p-values for the effect power calculation in Monte Carlo

Usage

effectpowermc(
  fit,
  type = "III",
  test = "Pr(>Chisq)",
  model_formula = NULL,
  firth = FALSE,
  glmfamily = "gaussian",
  effect_terms = NULL,
  RunMatrixReduced = NULL,
  method = NULL,
  contrastslist = contrastslist,
  effect_anova = FALSE,
  ...
)

Arguments

fit

Fit from regression

type

Default 'III'

test

Default 'Pr(>Chisq)'.

...

Additional arguments to pass to car::Anova

Value

p-values


Calculate Power of an Experimental Design

Description

Evaluates the power of an experimental design, for normal response variables, given the design's run matrix and the statistical model to be fit to the data. Returns a data frame of parameter and effect powers. Designs can consist of both continuous and categorical factors. By default, eval_design assumes a signal-to-noise ratio of 2, but this can be changed with the effectsize or anticoef parameters.

Usage

eval_design(
  design,
  model = NULL,
  alpha = 0.05,
  blocking = NULL,
  anticoef = NULL,
  effectsize = 2,
  varianceratios = NULL,
  contrasts = contr.sum,
  conservative = FALSE,
  reorder_factors = FALSE,
  detailedoutput = FALSE,
  advancedoptions = NULL,
  high_resolution_candidate_set = NA,
  moment_sample_density = 20,
  ...
)

Arguments

design

The experimental design. Internally, eval_design rescales each numeric column to the range [-1, 1], so you do not need to do this scaling manually.

model

The model used in evaluating the design. If this is missing and the design was generated with skpr, the generating model will be used. It can be a subset of the model used to generate the design, or include higher order effects not in the original design generation. It cannot include factors that are not present in the experimental design.

alpha

Default '0.05'. The specified type-I error.

blocking

Default 'NULL'. If 'TRUE', eval_design will look at the rownames (or blocking columns) to determine blocking structure. Default FALSE.

anticoef

The anticipated coefficients for calculating the power. If missing, coefficients will be automatically generated based on the effectsize argument.

effectsize

Default '2'. The signal-to-noise ratio. For continuous factors, this specifies the difference in response between the highest and lowest levels of the factor (which are -1 and +1 after eval_design normalizes the input data), assuming that the root mean square error is 1. If you do not specify anticoef, the anticipated coefficients will be half of effectsize. If you do specify anticoef, effectsize will be ignored.

varianceratios

Default 'NULL'. The ratio of the whole plot variance to the run-to-run variance. If not specified during design generation, this will default to 1. For designs with more than one subplot this ratio can be a vector specifying the variance ratio for each subplot (comparing to the run-to-run variance). Otherwise, it will use a single value for all strata.

contrasts

Default contr.sum. The function to use to encode the categorical factors in the model matrix. If the user has specified their own contrasts for a categorical factor using the contrasts function, those will be used. Otherwise, skpr will use 'contr.sum'.

conservative

Specifies whether default method for generating anticipated coefficents should be conservative or not. 'TRUE' will give the most conservative estimate of power by setting all but one (or multiple if they are equally low) level in each categorical factor's anticipated coefficients to zero. Default 'FALSE'.

reorder_factors

Default 'FALSE'. If 'TRUE', the levels will be reordered to generate the most conservative calculation of effect power. The function searches through all possible reference levels for a given factor and chooses the one that results in the lowest effect power. The reordering will be presenting in the output when 'detailedoutput = TRUE'.

detailedoutput

If 'TRUE“, return additional information about evaluation in results. Default FALSE.

advancedoptions

Default 'NULL'. A named list with parameters to specify additional attributes to calculate. Options: 'aliaspower' gives the degree at which the Alias matrix should be calculated.

high_resolution_candidate_set

Default 'NA'. If you have continuous numeric terms and disallowed combinations, the closed-form I-optimality value cannot be calculated and must be approximated by numeric integration. This requires sampling the allowed space densely, but most candidate sets will provide a sparse sampling of allowable points. To work around this, skpr will generate a convex hull of the numeric terms for each unique combination of categorical factors to generate a dense sampling of the space and cache that value internally, but this is a slow calculation and does not support non-convex candidate sets. To speed up moment matrix calculation, pass a higher resolution version of your candidate set here with the disallowed combinations already applied. If you generated your design externally from skpr, there are disallowed combinations in your design, and need correct I-optimalituy values, you must pass your candidate set here.

moment_sample_density

Default '20'. The density of points to sample when calculating the moment matrix to compute I-optimality. Only required if the design was generated outside of skpr and there are disallowed combinations. It is much faster and potentially more accurate to provide your own candidate set with higher resolution continuous factors to 'high_resolution_candidate_set'.'

...

Additional arguments.

Details

This function evaluates the power of experimental designs.

If the design is has no blocking or restrictions on randomization, the model assumed is:

y = X \beta + \epsilon.

If the design is a split-plot design, the model is as follows:

y = X \beta + Z bi + \epsilonij,

Here, y is the vector of experimental responses, X is the model matrix, \beta is the vector of model coefficients, Z_{i} are the blocking indicator, b_{i} is the random variable associated with the ith block, and \epsilon is a random variable normally distributed with zero mean and unit variance (root-mean-square error is 1.0).

eval_design calculates both parameter power as well as effect power, defined as follows:

1) Parameter power is the probability of rejecting the hypothesis H_0 : \beta_i = 0, where \beta_i is a single parameter in the model 2) Effect power is the probability of rejecting the hypothesis H_0 : \beta_{1} = \beta_{2} = ... = \beta_{n} 0 for all n coefficients for a categorical factor.

The two power types are equivalent for continuous factors and two-level categorical factors, but they will differ for categorical factors with three or more levels.

For split-plot designs, the degrees of freedom are allocated to each term according to the algorithm given in "Mixed-Effects Models in S and S-PLUS" (Pinheiro and Bates, pp. 91).

When using conservative = TRUE, eval_design first evaluates the power with the default (or given) coefficients. Then, for each multi-level categorical, it sets all coefficients to zero except the level that produced the lowest power, and then re-evaluates the power with this modified set of anticipated coefficients. If there are two or more equal power values in a multi-level categorical, two of the lowest equal terms are given opposite sign anticipated coefficients and the rest (for that categorical factor) are set to zero.

Value

A data frame with the parameters of the model, the type of power analysis, and the power. Several design diagnostics are stored as attributes of the data frame. In particular, the modelmatrix attribute contains the model matrix that was used for power evaluation. This is especially useful if you want to specify the anticipated coefficients to use for power evaluation. The model matrix provides the order of the model coefficients, as well as the encoding used for categorical factors.

Examples

#Generating a simple 2x3 factorial to feed into our optimal design generation
#of an 11-run design.
factorial = expand.grid(A = c(1, -1), B = c(1, -1), C = c(1, -1))

optdesign = gen_design(candidateset = factorial,
                      model= ~A + B + C, trials = 11, optimality = "D", repeats = 100)

#Now evaluating that design (with default anticipated coefficients and a effectsize of 2):
eval_design(design = optdesign, model= ~A + B + C, alpha = 0.2)

#Evaluating a subset of the design (which changes the power due to a different number of
#degrees of freedom)
eval_design(design = optdesign, model= ~A + C, alpha = 0.2)

#We do not have to input the model if it's the same as the model used
#During design generation. Here, we also use the default value for alpha (`0.05`)
eval_design(optdesign)

#Halving the signal-to-noise ratio by setting a different effectsize (default is 2):
eval_design(design = optdesign, model= ~A + B + C, alpha = 0.2, effectsize = 1)

#With 3+ level categorical factors, the choice of anticipated coefficients directly changes the
#final power calculation. For the most conservative power calculation, that involves
#setting all anticipated coefficients in a factor to zero except for one. We can specify this
#option with the "conservative" argument.

factorialcoffee = expand.grid(cost = c(1, 2),
                              type = as.factor(c("Kona", "Colombian", "Ethiopian", "Sumatra")),
                              size = as.factor(c("Short", "Grande", "Venti")))

designcoffee = gen_design(factorialcoffee,
                         ~cost + size + type, trials = 29, optimality = "D", repeats = 100)

#Evaluate the design, with default anticipated coefficients (conservative is FALSE by default).
eval_design(designcoffee)

#Evaluate the design, with conservative anticipated coefficients:
eval_design(designcoffee, conservative = TRUE)

#which is the same as the following, but now explicitly entering the coefficients:
eval_design(designcoffee, anticoef = c(1, 1, 1, 0, 0, 1, 0))

#You can also evaluate the design with higher order effects, even if they were not
#used in design generation:
eval_design(designcoffee, model = ~cost + size + type + cost * type)


#Generating and evaluating a split plot design:
splitfactorialcoffee = expand.grid(caffeine = c(1, -1),
                                  cost = c(1, 2),
                                  type = as.factor(c("Kona", "Colombian", "Ethiopian", "Sumatra")),
                                  size = as.factor(c("Short", "Grande", "Venti")))

coffeeblockdesign = gen_design(splitfactorialcoffee, ~caffeine, trials = 12)
coffeefinaldesign = gen_design(splitfactorialcoffee,
                              model = ~caffeine + cost + size + type, trials = 36,
                              splitplotdesign = coffeeblockdesign, blocksizes = 3)

#Evaluating design (blocking is automatically detected)
eval_design(coffeefinaldesign, 0.2, blocking = TRUE)

#Manually turn blocking off to see completely randomized design power
eval_design(coffeefinaldesign, 0.2, blocking = FALSE)

#We can also evaluate the design with a custom ratio between the whole plot error to
#the run-to-run error.
eval_design(coffeefinaldesign, 0.2, varianceratios = 2)

#If the design was generated outside of skpr and thus the row names do not have the
#blocking structure encoded already, the user can add these manually. For a 12-run
#design with 4 blocks, here is a vector indicated the blocks:

blockcolumn = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)

#If we wanted to add this blocking structure to the design coffeeblockdesign, we would
#add a column with the format "Block1", "Block2", "Block3" ... and each one will be treated
#as a separate blocking layer.

coffeeblockdesign$Block1 = blockcolumn

#By default, skpr will throw out the blocking columns unless the user specifies `blocking = TRUE`.
eval_design(coffeeblockdesign, blocking=TRUE)

Monte Carlo power evaluation for experimental designs with user-supplied libraries

Description

Evaluates the power of an experimental design, given its run matrix and the statistical model to be fit to the data, using monte carlo simulation. Simulated data is fit using a user-supplied fitting library and power is estimated by the fraction of times a parameter is significant. Returns a data frame of parameter powers.

Usage

eval_design_custom_mc(
  design,
  model = NULL,
  alpha = 0.05,
  nsim,
  rfunction,
  fitfunction,
  pvalfunction,
  anticoef,
  effectsize = 2,
  contrasts = contr.sum,
  coef_function = coef,
  calceffect = FALSE,
  detailedoutput = FALSE,
  parameternames = NULL,
  advancedoptions = NULL,
  progress = TRUE,
  parallel = FALSE,
  parallel_pkgs = NULL,
  ...
)

Arguments

design

The experimental design. Internally, eval_design_custom_mc rescales each numeric column to the range [-1, 1].

model

The model used in evaluating the design. If this is missing and the design was generated with skpr, the generating model will be used. It can be a subset of the model used to generate the design, or include higher order effects not in the original design generation. It cannot include factors that are not present in the experimental design.

alpha

Default '0.05'. The type-I error. p-values less than this will be counted as significant.

nsim

The number of simulations.

rfunction

Random number generator function. Should be a function of the form f(X, b), where X is the model matrix and b are the anticipated coefficients.

fitfunction

Function used to fit the data. Should be of the form f(formula, X, contrasts) where X is the model matrix. If contrasts do not need to be specified for the user supplied library, that argument can be ignored.

pvalfunction

Function that returns a vector of p-values from the object returned from the fitfunction.

anticoef

The anticipated coefficients for calculating the power. If missing, coefficients will be automatically generated based on effectsize.

effectsize

The signal-to-noise ratio. Default 2. For a gaussian model, and for continuous factors, this specifies the difference in response between the highest and lowest levels of a factor (which are +1 and -1 after normalization). More precisely: If you do not specify anticoef, the anticipated coefficients will be half of effectsize. If you do specify anticoef, effectsize will be ignored.

contrasts

Default contr.sum. Function used to generate the contrasts encoding for categorical variables. If the user has specified their own contrasts for a categorical factor using the contrasts function, those will be used. Otherwise, skpr will use contr.sum.

coef_function

Function that, when applied to a fitfunction return object, returns the estimated coefficients.

calceffect

Default 'FALSE'. Calculates effect power for a Type-III Anova (using the car package) using a Wald test. this ratio can be a vector specifying the variance ratio for each subplot. Otherwise, it will use a single value for all strata. To work, the fit returned by 'fitfunction' must have a method compatable with the car package.

detailedoutput

Default 'FALSE'. If 'TRUE', return additional information about evaluation in results.

parameternames

Vector of parameter names if the coefficients do not correspond simply to the columns in the model matrix (e.g. coefficients from an MLE fit).

advancedoptions

Default 'NULL'. Named list of advanced options. 'advancedoptions$anovatype' specifies the Anova type in the car package (default type 'III'), user can change to type 'II'). 'advancedoptions$anovatest' specifies the test statistic if the user does not want a 'Wald' test–other options are likelyhood-ratio 'LR' and F-test 'F'. 'advancedoptions$progressBarUpdater' is a function called in non-parallel simulations that can be used to update external progress bar.'advancedoptions$GUI' turns off some warning messages when in the GUI. If 'advancedoptions$save_simulated_responses = TRUE', the dataframe will have an attribute 'simulated_responses' that contains the simulated responses from the power evaluation. 'advancedoptions$ci_error_conf' will set the confidence level for power intervals, which are printed when 'detailedoutput = TRUE'.

progress

Default 'TRUE'. Whether to include a progress bar.

parallel

Default 'FALSE'. If 'TRUE', the power simulation will use all but one of the available cores. If the user wants to set the number of cores manually, they can do this by setting 'options("cores")' to the desired number (e.g. 'options("cores" = parallel::detectCores())'). NOTE: If you have installed BLAS libraries that include multicore support (e.g. Intel MKL that comes with Microsoft R Open), turning on parallel could result in reduced performance.

parallel_pkgs

A vector of strings listing the external packages to be included in each parallel worker.

...

Additional arguments.

Value

A data frame consisting of the parameters and their powers. The parameter estimates from the simulations are stored in the 'estimates' attribute.

Examples

#To demonstrate how a user can use their own libraries for Monte Carlo power generation,
#We will recreate eval_design_survival_mc using the eval_design_custom_mc framework.

#To begin, first let us generate the same design and random generation function shown in the
#eval_design_survival_mc examples:

basicdesign = expand.grid(a = c(-1, 1), b = c("a","b","c"))
design = gen_design(candidateset = basicdesign, model = ~a + b + a:b, trials = 100,
                         optimality = "D", repeats = 100)

#Random number generating function

rsurvival = function(X, b) {
 Y = rexp(n = nrow(X), rate = exp(-(X %*% b)))
 censored = Y > 1
 Y[censored] = 1
 return(survival::Surv(time = Y, event = !censored, type = "right"))
}

#We now need to tell the package how we want to fit our data,
#given the formula and the model matrix X (and, if needed, the list of contrasts).
#If the contrasts aren't required, "contrastslist" should be set to NULL.
#This should return some type of fit object.

fitsurv = function(formula, X, contrastslist = NULL) {
 return(survival::survreg(formula, data = X, dist = "exponential"))
}


#We now need to tell the package how to extract the p-values from the fit object returned
#from the fit function. This is how to extract the p-values from the survreg fit object:

pvalsurv = function(fit) {
 return(summary(fit)$table[, 4])
}

#And now we evaluate the design, passing the fitting function and p-value extracting function
#in along with the standard inputs for eval_design_mc.
#This has the exact same behavior as eval_design_survival_mc for the exponential distribution.
eval_design_custom_mc(design = design, model = ~a + b + a:b,
                     alpha = 0.05, nsim = 100,
                     fitfunction = fitsurv, pvalfunction = pvalsurv,
                     rfunction = rsurvival, effectsize = 1)
#We can also use skpr's framework for parallel computation to automatically parallelize this
#to speed up computation
## Not run: eval_design_custom_mc(design = design, model = ~a + b + a:b,
                         alpha = 0.05, nsim = 1000,
                         fitfunction = fitsurv, pvalfunction = pvalsurv,
                         rfunction = rsurvival, effectsize = 1,
                         parallel = TRUE, parallel_pkgs = "survival")

## End(Not run)

Monte Carlo Power Evaluation for Experimental Designs

Description

Evaluates the power of an experimental design, given the run matrix and the statistical model to be fit to the data, using monte carlo simulation. Simulated data is fit using a generalized linear model and power is estimated by the fraction of times a parameter is significant. Returns a data frame of parameter powers.

Usage

eval_design_mc(
  design,
  model = NULL,
  alpha = 0.05,
  blocking = NULL,
  nsim = 1000,
  glmfamily = "gaussian",
  calceffect = TRUE,
  effect_anova = TRUE,
  varianceratios = NULL,
  rfunction = NULL,
  anticoef = NULL,
  firth = FALSE,
  effectsize = 2,
  contrasts = contr.sum,
  high_resolution_candidate_set = NA,
  moment_sample_density = 20,
  parallel = FALSE,
  adjust_alpha_inflation = FALSE,
  detailedoutput = FALSE,
  progress = TRUE,
  advancedoptions = NULL,
  ...
)

Arguments

design

The experimental design. Internally, eval_design_mc rescales each numeric column to the range [-1, 1].

model

The model used in evaluating the design. If this is missing and the design was generated with skpr, the generating model will be used. It can be a subset of the model used to generate the design, or include higher order effects not in the original design generation. It cannot include factors that are not present in the experimental design.

alpha

Default '0.05'. The type-I error. p-values less than this will be counted as significant.

blocking

Default 'NULL'. If 'TRUE', eval_design_mc will look at the rownames (or blocking columns) to determine blocking structure. Default FALSE.

nsim

Default '1000'. The number of Monte Carlo simulations to perform.

glmfamily

Default 'gaussian'. String indicating the family of distribution for the 'glm' function ("gaussian", "binomial", "poisson", or "exponential").

calceffect

Default 'TRUE'. Whether to calculate effect power. This calculation is more expensive than parameter power, so turned off (if not needed) can greatly speed up calculation time.

effect_anova

Default 'TRUE', whether to a Type-III Anova or a likelihood ratio test to calculate effect power. If 'TRUE', effect power will be calculated using a Type-III Anova (using the car package) and a Wald test. If 'FALSE', a likelihood ratio test (using a reduced model for each effect) will performed using the 'lmtest' package. If 'firth = TRUE', this will be set to 'FALSE' automatically.

varianceratios

Default 'NULL'. The ratio of the whole plot variance to the run-to-run variance. If not specified during design generation, this will default to 1. For designs with more than one subplot this ratio can be a vector specifying the variance ratio for each subplot (comparing to the run-to-run variance). Otherwise, it will use a single value for all strata.

rfunction

Default 'NULL'.Random number generator function for the response variable. Should be a function of the form f(X, b, delta), where X is the model matrix, b are the anticipated coefficients, and delta is a vector of blocking errors. Typically something like rnorm(nrow(X), X * b + delta, 1). You only need to specify this if you do not like the default behavior described below.

anticoef

Default 'NULL'.The anticipated coefficients for calculating the power. If missing, coefficients will be automatically generated based on the effectsize argument.

firth

Default 'FALSE'. Whether to apply the firth correction (via the 'mbest' package) to a logistic regression. This setting also automatically sets 'effect_lr = TRUE'.

effectsize

Helper argument to generate anticipated coefficients. See details for more info. If you specify anticoef, effectsize will be ignored.

contrasts

Default contr.sum. The contrasts to use for categorical factors. If the user has specified their own contrasts for a categorical factor using the contrasts function, those will be used. Otherwise, skpr will use contr.sum. If the user wants to set the number of cores manually, they can do this by setting 'options("cores")' to the desired number (e.g. 'options("cores" = parallel::detectCores())'). NOTE: If you have installed BLAS libraries that include multicore support (e.g. Intel MKL that comes with Microsoft R Open), turning on parallel could result in reduced performance.

high_resolution_candidate_set

Default 'NA'. If you have continuous numeric terms and disallowed combinations, the closed-form I-optimality value cannot be calculated and must be approximated by numeric integration. This requires sampling the allowed space densely, but most candidate sets will provide a sparse sampling of allowable points. To work around this, skpr will generate a convex hull of the numeric terms for each unique combination of categorical factors to generate a dense sampling of the space and cache that value internally, but this is a slow calculation and does not support non-convex candidate sets. To speed up moment matrix calculation, pass a higher resolution version of your candidate set here with the disallowed combinations already applied. If you generated your design externally from skpr, there are disallowed combinations in your design, and need correct I-optimalituy values, you must pass your candidate set here.

moment_sample_density

Default '20'. The density of points to sample when calculating the moment matrix to compute I-optimality. Only required if the design was generated outside of skpr and there are disallowed combinations.

parallel

Default 'FALSE'. If 'TRUE', the Monte Carlo power calculation will use all but one of the available cores. If the user wants to set the number of cores manually, they can do this by setting 'options("cores")' to the desired number (e.g. 'options("cores" = parallel::detectCores())'). NOTE: If you have installed BLAS libraries that include multicore support (e.g. Intel MKL that comes with Microsoft R Open), turning on parallel could result in reduced performance.

adjust_alpha_inflation

Default 'FALSE'. If 'TRUE', this will run the simulation twice: first to calculate the empirical distribution of p-values under the null hypothesis and find the true Type-I error cutoff that corresponds to the desired Type-I error rate, and then again given effect size to calculate power values.

detailedoutput

Default 'FALSE'. If 'TRUE', return additional information about evaluation in results.

progress

Default 'TRUE'. Whether to include a progress bar.

advancedoptions

Default 'NULL'. Named list of advanced options. 'advancedoptions$anovatype' specifies the Anova type in the car package (default type 'III'), user can change to type 'II'). 'advancedoptions$anovatest' specifies the test statistic if the user does not want a 'Wald' test–other options are likelyhood-ratio 'LR' and F-test 'F'. 'advancedoptions$progressBarUpdater' is a function called in non-parallel simulations that can be used to update external progress bar.'advancedoptions$GUI' turns off some warning messages when in the GUI. If 'advancedoptions$save_simulated_responses = TRUE', the dataframe will have an attribute 'simulated_responses' that contains the simulated responses from the power evaluation. 'advancedoptions$ci_error_conf' will set the confidence level for power intervals, which are printed when 'detailedoutput = TRUE'.

...

Additional arguments.

Details

Evaluates the power of a design with Monte Carlo simulation. Data is simulated and then fit with a generalized linear model, and the fraction of simulations in which a parameter is significant (its p-value, according to the fit function used, is less than the specified alpha) is the estimate of power for that parameter.

First, if blocking = TURE, the random noise from blocking is generated with rnorm. Each block gets a single sample of Gaussian random noise, with a variance as specified in varianceratios, and that sample is copied to each run in the block. Then, rfunction is called to generate a simulated response for each run of the design, and the data is fit using the appropriate fitting function. The functions used to simulate the data and fit it are determined by the glmfamily and blocking arguments as follows. Below, X is the model matrix, b is the anticipated coefficients, and d is a vector of blocking noise (if blocking = FALSE then d = 0):

glmfamily blocking rfunction fit
"gaussian" F rnorm(mean = X %*% b + d, sd = 1) lm
"gaussian" T rnorm(mean = X %*% b + d, sd = 1) lme4::lmer
"binomial" F rbinom(prob = 1/(1+exp(-(X %*% b + d)))) glm(family = "binomial")
"binomial" T rbinom(prob = 1/(1+exp(-(X %*% b + d)))) lme4::glmer(family = "binomial")
"poisson" F rpois(lambda = exp((X %*% b + d))) glm(family = "poisson")
"poisson" T rpois(lambda = exp((X %*% b + d))) lme4::glmer(family = "poisson")
"exponential" F rexp(rate = exp(-(X %*% b + d))) glm(family = Gamma(link = "log"))
"exponential" T rexp(rate = exp(-(X %*% b + d))) lme4:glmer(family = Gamma(link = "log"))

Note that the exponential random generator uses the "rate" parameter, but skpr and glm use the mean value parameterization (= 1 / rate), hence the minus sign above. Also note that the gaussian model assumes a root-mean-square error of 1.

Power is dependent on the anticipated coefficients. You can specify those directly with the anticoef argument, or you can use the effectsize argument to specify an effect size and skpr will auto-generate them. You can provide either a length-1 or length-2 vector. If you provide a length-1 vector, the anticipated coefficients will be half of effectsize; this is equivalent to saying that the linear predictor (for a gaussian model, the mean response; for a binomial model, the log odds ratio; for an exponential model, the log of the mean value; for a poisson model, the log of the expected response) changes by effectsize when a continuous factor goes from its lowest level to its highest level. If you provide a length-2 vector, the anticipated coefficients will be set such that the mean response (for a gaussian model, the mean response; for a binomial model, the probability; for an exponential model, the mean response; for a poisson model, the expected response) changes from effectsize[1] to effectsize[2] when a factor goes from its lowest level to its highest level, assuming that the other factors are inactive (their x-values are zero).

The effect of a length-2 effectsize depends on the glmfamily argument as follows:

For glmfamily = 'gaussian', the coefficients are set to (effectsize[2] - effectsize[1]) / 2.

For glmfamily = 'binomial', the intercept will be 1/2 * log(effectsize[1] * effectsize[2] / (1 - effectsize[1]) / (1 - effectsize[2])), and the other coefficients will be 1/2 * log(effectsize[2] * (1 - effectsize[1]) / (1 - effectsize[2]) / effectsize[1]).

For glmfamily = 'exponential' or 'poisson', the intercept will be 1 / 2 * (log(effectsize[2]) + log(effectsize[1])), and the other coefficients will be 1 / 2 * (log(effectsize[2]) - log(effectsize[1])).

Value

A data frame consisting of the parameters and their powers, with supplementary information stored in the data frame's attributes. The parameter estimates from the simulations are stored in the "estimates" attribute. The "model.matrix" attribute contains the model matrix that was used for power evaluation, and also provides the encoding used for categorical factors. If you want to specify the anticipated coefficients manually, do so in the order the parameters appear in the model matrix.

Examples

#We first generate a full factorial design using expand.grid:
factorialcoffee = expand.grid(cost = c(-1, 1),
                              type = as.factor(c("Kona", "Colombian", "Ethiopian", "Sumatra")),
                              size = as.factor(c("Short", "Grande", "Venti")))
if(skpr:::run_documentation()) {
#And then generate the 21-run D-optimal design using gen_design.
designcoffee = gen_design(factorialcoffee,
                         model = ~cost + type + size, trials = 21, optimality = "D")
}
if(skpr:::run_documentation()) {
#To evaluate this design using a normal approximation, we just use eval_design
#(here using the default settings for contrasts, effectsize, and the anticipated coefficients):

eval_design(design = designcoffee, model = ~cost + type + size, 0.05)
}
if(skpr:::run_documentation()) {
#To evaluate this design with a Monte Carlo method, we enter the same information
#used in eval_design, with the addition of the number of simulations "nsim" and the distribution
#family used in fitting for the glm "glmfamily". For gaussian, binomial, exponential, and poisson
#families, a default random generating function (rfunction) will be supplied. If another glm
#family is used or the default random generating function is not adequate, a custom generating
#function can be supplied by the user. Like in `eval_design()`, if the model isn't entered, the
#model used in generating the design will be used.

eval_design_mc(designcoffee, nsim = 100, glmfamily = "gaussian")
}
if(skpr:::run_documentation()) {
#We can also add error bars on the Monte Carlo power values by setting
#`detailedoutput = TRUE` (which will print out other information as well).
#We can set the confidence via the `advancedoptions` argument.
eval_design_mc(designcoffee, nsim = 100, glmfamily = "gaussian",
              detailedoutput = TRUE, advancedoptions = list(ci_error_conf  = 0.8))
}
if(skpr:::run_documentation()) {
#We see here we generate approximately the same parameter powers as we do
#using the normal approximation in eval_design. Like eval_design, we can also change
#effectsize to produce a different signal-to-noise ratio:

eval_design_mc(design = designcoffee, nsim = 100,
                       glmfamily = "gaussian", effectsize = 1)
}
if(skpr:::run_documentation()) {
#Like eval_design, we can also evaluate the design with a different model than
#the one that generated the design.
eval_design_mc(design = designcoffee, model = ~cost + type, alpha = 0.05,
              nsim = 100, glmfamily = "gaussian")
}
if(skpr:::run_documentation()) {
#And here it is evaluated with additional interactions included:
eval_design_mc(design = designcoffee, model = ~cost + type + size + cost * type, 0.05,
              nsim = 100, glmfamily = "gaussian")
}
if(skpr:::run_documentation()) {
#We can also set "parallel = TRUE" to use all the cores available to speed up
#computation.
eval_design_mc(design = designcoffee, nsim = 10000,
                       glmfamily = "gaussian", parallel = TRUE)
}
if(skpr:::run_documentation()) {
#We can also evaluate split-plot designs. First, let us generate the split-plot design:

factorialcoffee2 = expand.grid(Temp = c(1, -1),
                               Store = as.factor(c("A", "B")),
                               cost = c(-1, 1),
                               type = as.factor(c("Kona", "Colombian", "Ethiopian", "Sumatra")),
                               size = as.factor(c("Short", "Grande", "Venti")))

vhtcdesign = gen_design(factorialcoffee2,
                       model = ~Store, trials = 6, varianceratio = 1)
htcdesign = gen_design(factorialcoffee2, model = ~Store + Temp, trials = 18,
                       splitplotdesign = vhtcdesign, blocksizes = rep(3, 6), varianceratio = 1)
splitplotdesign = gen_design(factorialcoffee2,
                            model = ~Store + Temp + cost + type + size, trials = 54,
                            splitplotdesign = htcdesign, blocksizes = rep(3, 18),
                            varianceratio = 1)

#Each block has an additional noise term associated with it in addition to the normal error
#term in the model. This is specified by a vector specifying the additional variance for
#each split-plot level. This is equivalent to specifying a variance ratio of one between
#the whole plots and the run-to-run variance for gaussian models.

#Evaluate the design. Note the decreased power for the blocking factors.
eval_design_mc(splitplotdesign, blocking = TRUE, nsim = 100,
                       glmfamily = "gaussian", varianceratios = c(1, 1, 1))
}
if(skpr:::run_documentation()) {
#We can also use this method to evaluate designs that cannot be easily
#evaluated using normal approximations. Here, we evaluate a design with a binomial response and see
#whether we can detect the difference between each factor changing whether an event occurs
#70% of the time or 90% of the time.

factorialbinom = expand.grid(a = c(-1, 1), b = c(-1, 1))
designbinom = gen_design(factorialbinom, model = ~a + b, trials = 90, optimality = "D")

eval_design_mc(designbinom, ~a + b, alpha = 0.2, nsim = 100, effectsize = c(0.7, 0.9),
              glmfamily = "binomial")
}
if(skpr:::run_documentation()) {
#We can also use this method to determine power for poisson response variables.
#Generate the design:

factorialpois = expand.grid(a = as.numeric(c(-1, 0, 1)), b = c(-1, 0, 1))
designpois = gen_design(factorialpois, ~a + b, trials = 70, optimality = "D")

#Evaluate the power:

eval_design_mc(designpois, ~a + b, 0.05, nsim = 100, glmfamily = "poisson",
               anticoef = log(c(0.2, 2, 2)))
}

#The coefficients above set the nominal value -- that is, the expected count
#when all inputs = 0 -- to 0.2 (from the intercept), and say that each factor
#changes this count by a factor of 4 (multiplied by 2 when x= +1, and divided by 2 when x = -1).
#Note the use of log() in the anticipated coefficients.

Evaluate Power for Survival Design

Description

Evaluates power for an experimental design in which the response variable may be right- or left-censored. Power is evaluated with a Monte Carlo simulation, using the survival package and survreg to fit the data. Split-plot designs are not supported.

Usage

eval_design_survival_mc(
  design,
  model = NULL,
  alpha = 0.05,
  nsim = 1000,
  distribution = "gaussian",
  censorpoint = NA,
  censortype = "right",
  rfunctionsurv = NULL,
  anticoef = NULL,
  effectsize = 2,
  contrasts = contr.sum,
  parallel = FALSE,
  detailedoutput = FALSE,
  progress = TRUE,
  advancedoptions = NULL,
  ...
)

Arguments

design

The experimental design. Internally, all numeric columns will be rescaled to [-1, +1].

model

The model used in evaluating the design. If this is missing and the design was generated with skpr, the generating model will be used. It can be a subset of the model used to generate the design, or include higher order effects not in the original design generation. It cannot include factors that are not present in the experimental design.

alpha

Default '0.05'. The type-I error. p-values less than this will be counted as significant.

nsim

The number of simulations. Default 1000.

distribution

Distribution of survival function to use when fitting the data. Valid choices are described in the documentation for survreg. Supported options are "exponential", "lognormal", or "gaussian". Default "gaussian".

censorpoint

The point after/before (for right-censored or left-censored data, respectively) which data should be labelled as censored. Default NA for no censoring. This argument is used only by the internal random number generators; if you supply your own function to the rfunctionsurv parameter, then this parameter will be ignored.

censortype

The type of censoring (either "left" or "right"). Default "right".

rfunctionsurv

Random number generator function. Should be a function of the form f(X, b), where X is the model matrix and b are the anticipated coefficients. This function should return a Surv object from the survival package. You do not need to provide this argument if distribution is one of the supported choices and you are satisfied with the default behavior described below.

anticoef

The anticipated coefficients for calculating the power. If missing, coefficients will be automatically generated based on the effectsize argument.

effectsize

Helper argument to generate anticipated coefficients. See details for more info. If you specify anticoef, effectsize will be ignored.

contrasts

Default contr.sum. Function used to encode categorical variables in the model matrix. If the user has specified their own contrasts for a categorical factor using the contrasts function, those will be used. Otherwise, skpr will use contr.sum.

parallel

Default 'FALSE'. If 'TRUE', the power simulation will use all but one of the available cores. If the user wants to set the number of cores manually, they can do this by setting 'options("cores")' to the desired number (e.g. 'options("cores" = parallel::detectCores())'). NOTE: If you have installed BLAS libraries that include multicore support (e.g. Intel MKL that comes with Microsoft R Open), turning on parallel could result in reduced performance.

detailedoutput

Default 'FALSE'. If 'TRUE', return additional information about evaluation in results.

progress

Default 'TRUE'. Whether to include a progress bar.

advancedoptions

Default 'NULL'. Named list of advanced options. Pass 'progressBarUpdater' to include function called in non-parallel simulations that can be used to update external progress bar. 'advancedoptions$ci_error_conf' will set the confidence level for power intervals, which are printed when 'detailedoutput = TRUE'.

...

Any additional arguments to be passed into the survreg function during fitting.

Details

Evaluates the power of a design with Monte Carlo simulation. Data is simulated and then fit with a survival model (survival::survreg), and the fraction of simulations in which a parameter is significant (its p-value is less than the specified alpha) is the estimate of power for that parameter.

If not supplied by the user, rfunctionsurv will be generated based on the distribution argument as follows:

distribution generating function
"gaussian" rnorm(mean = X %*% b, sd = 1)
"exponential" rexp(rate = exp(-X %*% b))
"lognormal" rlnorm(meanlog = X %*% b, sdlog = 1)

In each case, if a simulated data point is past the censorpoint (greater than for right-censored, less than for left-censored) it is marked as censored. See the examples below for how to construct your own function.

Power is dependent on the anticipated coefficients. You can specify those directly with the anticoef argument, or you can use the effectsize argument to specify an effect size and skpr will auto-generate them. You can provide either a length-1 or length-2 vector. If you provide a length-1 vector, the anticipated coefficients will be half of effectsize; this is equivalent to saying that the linear predictor (for a gaussian model, the mean response; for an exponential model or lognormal model, the log of the mean value) changes by effectsize when a continuous factor goes from its lowest level to its highest level. If you provide a length-2 vector, the anticipated coefficients will be set such that the mean response changes from effectsize[1] to effectsize[2] when a factor goes from its lowest level to its highest level, assuming that the other factors are inactive (their x-values are zero).

The effect of a length-2 effectsize depends on the distribution argument as follows:

For distribution = 'gaussian', the coefficients are set to (effectsize[2] - effectsize[1]) / 2.

For distribution = 'exponential' or 'lognormal', the intercept will be 1 / 2 * (log(effectsize[2]) + log(effectsize[1])), and the other coefficients will be 1 / 2 * (log(effectsize[2]) - log(effectsize[1])).

Value

A data frame consisting of the parameters and their powers. The parameter estimates from the simulations are stored in the 'estimates' attribute. The 'modelmatrix' attribute contains the model matrix and the encoding used for categorical factors. If you manually specify anticipated coefficients, do so in the order of the model matrix.

Examples

#These examples focus on the survival analysis case and assume familiarity
#with the basic functionality of eval_design_mc.

#We first generate a simple 2-level design using expand.grid:
basicdesign = expand.grid(a = c(-1, 1))
design = gen_design(candidateset = basicdesign, model = ~a, trials = 15)

#We can then evaluate the power of the design in the same way as eval_design_mc,
#now including the type of censoring (either right or left) and the point at which
#the data should be censored:

eval_design_survival_mc(design = design, model = ~a, alpha = 0.05,
                        nsim = 100, distribution = "exponential",
                        censorpoint = 5, censortype = "right")

#Built-in Monte Carlo random generating functions are included for the gaussian, exponential,
#and lognormal distributions.

#We can also evaluate different censored distributions by specifying a custom
#random generating function and changing the distribution argument.

rlognorm = function(X, b) {
  Y = rlnorm(n = nrow(X), meanlog = X %*% b, sdlog = 0.4)
  censored = Y > 1.2
  Y[censored] = 1.2
  return(survival::Surv(time = Y, event = !censored, type = "right"))
}

#Any additional arguments are passed into the survreg function call.  As an example, you
#might want to fix the "scale" argument to survreg, when fitting a lognormal:

eval_design_survival_mc(design = design, model = ~a, alpha = 0.2, nsim = 100,
                        distribution = "lognormal", rfunctionsurv = rlognorm,
                        anticoef = c(0.184, 0.101), scale = 0.4)

Extract p-values from a model object

Description

Extract p-values from a model object. Currently works with lm, glm, lme4, glmer, and survreg model objects. If possible, uses the p-values reported in summary(model_fit).

Usage

extractPvalues(model_fit, glmfamily = "gaussian")

Arguments

model_fit

The model object from which to extract.

Value

Returns a vector of p-values. If model_fit is not a supported model type, returns NULL.


Generates Anticipated Coefficients

Description

Generates Anticipated Coefficients

Usage

gen_anticoef(RunMatrix, model, nointercept)

Arguments

RunMatrix

The run matrix

model

The model

nointercept

TRUE if intercept not in model

Value

Anticipated coefficients.


Generates Binomial Anticipated Coefficients

Description

Generates Binomial Anticipated Coefficients Solves the logistic function log(p / (1-p)) = beta0 + beta1 * x such that p = lowprob when x = -1, and p = highprob when x = +1. Equivalently, solves this set of equations for beta0 and beta1: log(lowprob / (1 - lowprob)) = beta0 - beta1 log(highprob / (1 - highprob)) = beta0 + beta1

Usage

gen_binomial_anticoef(anticoef, lowprob, highprob)

Arguments

anticoef

Anticipated coefficients

lowprob

Default 0.50. The base probability

highprob

Default 0.80. The high probability

Value

Anticipated coefficients.


Generate optimal experimental designs

Description

Creates an experimental design given a model, desired number of runs, and a data frame of candidate test points. gen_design chooses points from the candidate set and returns a design that is optimal for the given statistical model.

Usage

gen_design(
  candidateset,
  model,
  trials,
  splitplotdesign = NULL,
  blocksizes = NULL,
  optimality = "D",
  augmentdesign = NULL,
  repeats = 20,
  custom_v = NULL,
  varianceratio = 1,
  contrast = contr.simplex,
  aliaspower = 2,
  minDopt = 0.8,
  k = NA,
  moment_sample_density = 20,
  high_resolution_candidate_set = NA,
  parallel = FALSE,
  progress = TRUE,
  add_blocking_columns = FALSE,
  randomized = TRUE,
  advancedoptions = NULL,
  timer = NULL
)

Arguments

candidateset

A data frame of candidate test points; each run of the optimal design will be chosen (with replacement) from this candidate set. Each row of the data frame is a candidate test point. Each row should be unique. Usually this is a full factorial test matrix generated for the factors in the model unless there are disallowed combinations of runs. Factors present in the candidate set but not present in the model are stripped out, and the duplicate entries in the candidate set are removed. Disallowed combinations can be specified by simply removing them from the candidate set. Disallowed combinations between a hard-to-change and an easy-to-change factor are detected by comparing an internal candidate set generated by the unique levels present in the candidate set and the split plot design. Those points are then excluded from the search. If a factor is continuous, its column should be type numeric. If a factor is categorical, its column should be type factor or character.

model

The statistical model used to generate the test design.

trials

The number of runs in the design.

splitplotdesign

If 'NULL', a fully randomized design is generated. If not NULL, a split-plot design is generated, and this argument specifies the design for all of the factors harder to change than the current set of factors. Each row corresponds to a block in which the harder to change factors will be held constant. Each row of splitplotdesign will be replicated as specified in blocksizes, and the optimal design is found for all of the factors given in the model argument, taking into consideration the fixed and replicated hard-to-change factors. If blocksizes is missing, 'gen_design' will attempt to allocate the runs in the most balanced design possible, given the number of blocks given in the argument 'splitplotdesign' and the total number of 'trials'.

blocksizes

Default 'NULL'. Specifies the block size(s) for design generation. If only one number is passed, 'gen_design()' will create blocks of the specified size, and if the total number of run specified in 'trials' is not divisible by the number, 'gen_design()' will attempt to allocate the runs in the most balanced design possible. If a list is passed, each entry in the list will specify an additional layer of blocking. If 'splitplotdesign' is not 'NULL', this argument specifies the number of subplots within each whole plot (each whole plot corresponding to a row in the 'splitplotdesign' data.frame).

optimality

Default 'D'. The optimality criterion used in generating the design. Full list of supported criteria: "D", "I", "A", "ALIAS", "G", "T", "E", or "CUSTOM". If "CUSTOM", user must also define a function of the model matrix named 'customOpt' in their namespace that returns a single value, which the algorithm will attempt to optimize. For 'CUSTOM' optimality split-plot designs, the user must instead define 'customBlockedOpt', which should be a function of the model matrix and the variance-covariance matrix. For information on the algorithm behind Alias-optimal designs, see Jones and Nachtsheim. "Efficient Designs With Minimal Aliasing." Technometrics, vol. 53, no. 1, 2011, pp. 62-71.

augmentdesign

Default NULL. A 'data.frame' of runs that are fixed during the optimal search process. The columns of 'augmentdesign' must match those of the candidate set. The search algorithm will search for the optimal 'trials' - 'nrow(augmentdesign)' remaining runs.

repeats

Default '20'. The number of times to repeat the search for the best optimal design.

custom_v

Default 'NULL'. The user can pass a custom variance-covariance matrix to be used during blocked design generation.

varianceratio

Default '1'. The ratio between the block and run-to-run variance for a given stratum in a split plot/blocked design. This requires a design passed into 'splitplotdesign', so it will be overridden to '1' if no split plot design is entered.

contrast

Default 'contr.simplex', an orthonormal sum contrast. Function used to generate the encoding for categorical variables.

aliaspower

Default '2'. Degree of interactions to be used in calculating the alias matrix for alias optimal designs.

minDopt

Default '0.8'. Minimum value for the D-Optimality of a design when searching for Alias-optimal designs.

k

Default 'NA'. For D-optimal designs, this changes the search to a k-exchange algorithm Johnson and Nachtsheim. "Some Guidelines for Constructing Exact D-Optimal Designs on Convex Design Spaces." Technometrics, vol. 25, 1983, pp. 271-277. This exchanges only the k lowest variance runs in the design at each search iteration. Lower numbers can result in a faster search, but are less likely tofind an optimal design. Values of 'k >= n/4' have been shown empirically to generate similar designs to the full search. When ‘k == trials', this results in the default modified Federov’s algorithm. A ‘k' of 1 is a form of Wynn’s algorithm Wynn. "Results in the Theory and Construction of D-Optimum Experimental Designs," Journal of the Royal Statistical Society, Ser. B,vol. 34, 1972, pp. 133-14.

moment_sample_density

Default '20'. The density of points to sample when calculating the moment matrix to compute I-optimality if there are disallowed combinations. Otherwise, the closed-form moment matrix can be calculated.

high_resolution_candidate_set

Default 'NA'. If you have continuous numeric terms and disallowed combinations, the closed-form I-optimality value cannot be calculated and must be approximated by numeric integration. This requires sampling the allowed space densely, but most candidate sets will provide a sparse sampling of allowable points. To work around this, skpr will generate a convex hull of the numeric terms for each unique combination of categorical factors to generate a dense sampling of the space and cache that value internally, but this is a slow calculation and does not support non-convex candidate sets. To speed up moment matrix calculation, pass a higher resolution version of your candidate set here with the disallowed combinations already applied.

parallel

Default 'FALSE'. If 'TRUE', the optimal design search will use all but one of the available cores. This can lead to a substantial speed-up in the search for complex designs. If the user wants to set the number of cores manually, they can do this by setting 'options("cores")' to the desired number (e.g. 'options("cores" = parallel::detectCores())'). NOTE: If you have installed BLAS libraries that include multicore support (e.g. Intel MKL that comes with Microsoft R Open), turning on parallel could result in reduced performance.

progress

Default 'TRUE'. Whether to include a progress bar.

add_blocking_columns

Default 'FALSE'. The blocking structure of the design will be indicated in the row names of the returned design. If 'TRUE', the design also will have extra columns to indicate the blocking structure. If no blocking is detected, no columns will be added.

randomized

Default 'TRUE', due to the intrinsic randomization of the design search algorithm. If 'FALSE', the randomized design will be re-ordered from left to right.

advancedoptions

Default 'NULL'. An named list for advanced users who want to adjust the optimal design algorithm parameters. Advanced option names are 'design_search_tolerance' (the smallest fractional increase below which the design search terminates), 'alias_tie_power' (the degree of the aliasing matrix when calculating optimality tie-breakers), 'alias_tie_tolerance' (the smallest absolute difference in the optimality criterion where designs are considered equal before considering the aliasing structure), 'alias_compare“ (which if set to FALSE turns off alias tie breaking completely), 'aliasmodel' (provided if the user does not want to calculate Alias-optimality using all 'aliaspower' interaction terms), and ‘progressBarUpdater“ (a function called in non-parallel optimal searches that can be used to update an external progress bar). Finally, there’s 'g_efficiency_method', which sets the method used to calculate G-efficiency (default is "random" for a random Monte Carlo sampling of the design space, "optim" for to use simulated annealing, or "custom" to explicitly define the points in the design space, which is the fastest method and the only way to calculate prediction variance with disallowed combinations). With this, there's also 'g_efficiency_samples', which specifies the number of random samples (default 1000 if 'g_efficiency_method = "random"'), attempts at simulated annealing (default 1 if 'g_efficiency_method = "optim"'), or a data.frame defining the exact points of the design space if 'g_efficiency_method = "custom"'.

timer

Deprecated: Use 'progress' instead.

Details

Split-plot designs can be generated with repeated applications of gen_design; see examples for details.

Value

A data frame containing the run matrix for the optimal design. The returned data frame contains supplementary information in its attributes, which can be accessed with the 'get_attributes()' and 'get_optimality()' functions.

Examples

#Generate the basic factorial candidate set with expand.grid.
#Generating a basic 2 factor candidate set:
basic_candidates = expand.grid(x1 = c(-1, 1), x2 = c(-1, 1))

#This candidate set is used as an input in the optimal design generation for a
#D-optimal design with 11 runs.
design = gen_design(candidateset = basic_candidates, model = ~x1 + x2, trials = 11)

#We can also use the dot formula to automatically use all of the terms in the model:
design = gen_design(candidateset = basic_candidates, model = ~., trials = 11)

#Here we add categorical factors, specified by using "as.factor" in expand.grid:
categorical_candidates = expand.grid(a = c(-1, 1),
                                     b = as.factor(c("A", "B")),
                                     c = as.factor(c("High", "Med", "Low")))

#This candidate set is used as an input in the optimal design generation.
design2 = gen_design(candidateset = categorical_candidates, model = ~a + b + c, trials = 19)

#We can also increase the number of times the algorithm repeats
#the search to increase the probability that the globally optimal design was found.
design2 = gen_design(candidateset = categorical_candidates,
                    model = ~a + b + c, trials = 19, repeats = 100)

#We can perform a k-exchange algorithm instead of a full search to help speed up
#the search process, although this can lead to less optimal designs. Here, we only
#exchange the 10 lowest variance runs in each search iteration.
if(skpr:::run_documentation()) {
design_k = gen_design(candidateset = categorical_candidates,
                    model = ~a + b + c, trials = 19, repeats = 100, k = 10)
}

#To speed up the design search, you can turn on multicore support with the parallel option.
#You can also customize the number of cores used by setting the cores option. By default,
#all cores are used.
if(skpr:::run_documentation()) {
options(cores = 2)
design2 = gen_design(categorical_candidates,
                    model = ~a + b + c, trials = 19, repeats = 1000, parallel = TRUE)
}

#You can also use a higher order model when generating the design:
design2 = gen_design(categorical_candidates,
                    model = ~a + b + c + a * b * c, trials = 12, repeats = 10)

#To evaluate a response surface design, include center points
#in the candidate set and include quadratic effects (but not for the categorical factors).

quad_candidates = expand.grid(a = c(1, 0, -1), b = c(-1, 0, 1), c = c("A", "B", "C"))

gen_design(quad_candidates, ~a + b + I(a^2) + I(b^2) + a * b * c, 20)

#The optimality criterion can also be changed:
gen_design(quad_candidates, ~a + b + I(a^2) + I(b^2) + a * b * c, 20,
          optimality = "I", repeats = 10)
gen_design(quad_candidates, ~a + b + I(a^2) + I(b^2) + a * b * c, 20,
          optimality = "A", repeats = 10)

#A blocked design can be generated by specifying the `blocksizes` argument. Passing a single
#number will create designs with blocks of that size, while passing multiple values in a list
#will specify multiple layers of blocking.

#Specify a single layer
gen_design(quad_candidates, ~a + b + c, 21, blocksizes=3, add_blocking_column=TRUE)

#Manually specify the block sizes for a single layer, must add to `trials``
gen_design(quad_candidates, ~a + b + c, 21, blocksizes=c(4,3,2,3,3,3,3),
          add_blocking_column=TRUE)

#Multiple layers of blocking
gen_design(quad_candidates, ~a + b + c, 21, blocksizes=list(7,3),
          add_blocking_column=TRUE)

#Multiple layers of blocking, specified individually
gen_design(quad_candidates, ~a + b + c, 21, blocksizes=list(7,c(4,3,2,3,3,3,3)),
          add_blocking_column=TRUE)

#A split-plot design can be generated by first generating an optimal blocking design using the
#hard-to-change factors and then using that as the input for the split-plot design.
#This generates an optimal subplot design that accounts for the existing split-plot settings.

splitplotcandidateset = expand.grid(Altitude = c(-1, 1),
                                    Range = as.factor(c("Close", "Medium", "Far")),
                                    Power = c(1, -1))
hardtochangedesign = gen_design(splitplotcandidateset, model = ~Altitude,
                               trials = 11, repeats = 10)

#Now we can use the D-optimal blocked design as an input to our full design.

#Here, we add the easy to change factors from the candidate set to the model,
#and input the hard-to-change design along with the new number of trials. `gen_design` will
#automatically allocate the runs in the blocks in the most balanced way possible.

designsplitplot = gen_design(splitplotcandidateset, ~Altitude + Range + Power, trials = 33,
                             splitplotdesign = hardtochangedesign, repeats = 10)

#If we want to allocate the blocks manually, we can do that with the argument `blocksizes`. This
#vector must sum to the number of `trials` specified.

#Putting this all together:
designsplitplot = gen_design(splitplotcandidateset, ~Altitude + Range + Power, trials = 33,
                             splitplotdesign = hardtochangedesign,
                             blocksizes = c(4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2), repeats = 10)

#The split-plot structure is encoded into the row names, with a period
#demarcating the blocking level. This process can be repeated for arbitrary
#levels of blocking (i.e. a split-plot design can be entered in as the hard-to-change
#to produce a split-split-plot design, which can be passed as another
#hard-to-change design to produce a split-split-split plot design, etc).
#In the following, note that the model builds up as we build up split plot strata.

splitplotcandidateset2 = expand.grid(Location = as.factor(c("East", "West")),
                                     Climate = as.factor(c("Dry", "Wet", "Arid")),
                                     Vineyard = as.factor(c("A", "B", "C", "D")),
                                     Age = c(1, -1))
#6 blocks of Location:
temp = gen_design(splitplotcandidateset2, ~Location, trials = 6, varianceratio = 2, repeats = 10)

#Each Location block has 2 blocks of Climate:
temp = gen_design(splitplotcandidateset2, ~Location + Climate,
                  trials = 12, splitplotdesign = temp, blocksizes = 2,
                  varianceratio = 1, repeats = 10)

#Each Climate block has 4 blocks of Vineyard:
temp = gen_design(splitplotcandidateset2, ~Location + Climate + Vineyard,
                  trials = 48, splitplotdesign = temp, blocksizes = 4,
                  varianceratio = 1, repeats = 10)

#Each Vineyard block has 4 runs with different Age:
if(skpr:::run_documentation()) {
splitsplitsplitplotdesign = gen_design(splitplotcandidateset2, ~Location + Climate + Vineyard + Age,
                                       trials = 192, splitplotdesign = temp, blocksizes = 4,
                                       varianceratio = 1, add_blocking_columns = TRUE)
}
#gen_design also supports user-defined optimality criterion. The user defines a function
#of the model matrix named customOpt, and gen_design will attempt to generate a design
#that maximizes that function. This function needs to be in the global environment, and be
#named either customOpt or customBlockedOpt, depending on whether a split-plot design is being
#generated. customBlockedOpt should be a function of the model matrix as well as the
#variance-covariance matrix, vInv. Due to the underlying C + + code having to call back to the R
#environment repeatedly, this criterion will be significantly slower than the built-in algorithms.
#It does, however, offer the user a great deal of flexibility in generating their designs.

#We are going to write our own D-optimal search algorithm using base R functions. Here, write
#a function that calculates the determinant of the information matrix. gen_design will search
#for a design that maximizes this function.

customOpt = function(currentDesign) {
 return(det(t(currentDesign) %*% currentDesign))
}

#Generate the whole plots for our split-plot design, using the custom criterion.

candlistcustom = expand.grid(Altitude = c(10000, 20000),
                            Range = as.factor(c("Close", "Medium", "Far")),
                            Power = c(50, 100))
htcdesign = gen_design(candlistcustom, model = ~Altitude + Range,
                      trials = 11, optimality = "CUSTOM", repeats = 10)

#Now define a function that is a function of both the model matrix,
#as well as the variance-covariance matrix vInv. This takes the blocking structure into account
#when calculating our determinant.

customBlockedOpt = function(currentDesign, vInv) {
 return(det(t(currentDesign) %*% vInv %*% currentDesign))
}

#And finally, calculate the design. This (likely) results in the same design had we chosen the
#"D" criterion.

design = gen_design(candlistcustom,
                   ~Altitude + Range + Power, trials = 33,
                   splitplotdesign = htcdesign, blocksizes = 3,
                   optimality = "CUSTOM", repeats = 10)

#gen_design can also augment an existing design. Input a dataframe of pre-existing runs
#to the `augmentdesign` argument. Those runs in the new design will be fixed, and gen_design
#will perform a search for the remaining `trials - nrow(augmentdesign)` runs.

candidateset = expand.grid(height = c(10, 20), weight = c(45, 55, 65), range = c(1, 2, 3))

design_to_augment = gen_design(candidateset, ~height + weight + range, 5)

#As long as the columns in the augmented design match the columns in the candidate set,
#this design can be augmented.

augmented_design = gen_design(candidateset,
                             ~height + weight + range, 16, augmentdesign = design_to_augment)

#A design's diagnostics can be accessed via the `get_optimality()` function:

get_optimality(augmented_design)

#And design attributes can be accessed with the `get_attribute()` function:

get_attribute(design)

#A correlation color map can be produced by calling the plot_correlation command with the output
#of gen_design()

if(skpr:::run_documentation()) {
plot_correlations(design2)
}

#A fraction of design space plot can be produced by calling the plot_fds command
if(skpr:::run_documentation()) {
plot_fds(design2)
}

#Evaluating the design for power can be done with eval_design, eval_design_mc (Monte Carlo)
#eval_design_survival_mc (Monte Carlo survival analysis), and
#eval_design_custom_mc (Custom Library Monte Carlo)

Generates Exponential Anticipated Coefficients

Description

Generates Exponential Anticipated Coefficients Solves the exponential link function mean = exp(beta0 + beta1 * x) such that mean = mean_low when x = -1, and mean = mean_high when x = +1. Equivalently, solves this set of equations for beta0 and beta1: mean_low = exp(beta0 - beta1) mean_high = exp(beta0 + beta1)

Usage

gen_exponential_anticoef(anticoef, mean_low, mean_high)

Arguments

anticoef

input anticipated coefficients

mean_low

The low value of the mean value (= 1/rate)

mean_high

The high value of the mean value (= 1/rate)

Value

Anticipated coefficients.


Approximate continuous moment matrix over a numeric bounding box

Description

Treats the min–max range of each numeric column as a bounding box and samples points uniformly to approximate the integral of the outer product of the model terms, weighted by whether the point is on the edge.

Usage

gen_momentsmatrix_constrained(
  candidate_set,
  formula = ~.,
  n_samples_per_dimension = 10,
  user_provided_high_res_candidateset = FALSE
)

Arguments

candidate_set

Candidate set

formula

Default '~ .'. Model formula specifying the terms.

n_samples_per_dimension

Default '10'. Number of samples to take per dimension when interpolating inside the convex hull.

Value

A matrix of size 'p x p' (where 'p' is the number of columns in the model matrix), approximating the continuous moment matrix.


Generates the moment matrix, assuming no constraints

Description

Returns number of levels prior to each parameter

Usage

gen_momentsmatrix_ideal(modelfactors, classvector)

Value

Returns a vector consisting of the number of levels preceding each parameter (including the intercept)


Generates Poisson Anticipated Coefficients

Description

Generates Poisson Anticipated Coefficients Solves the Poisson link function mean = exp(beta0 + beta1 * x) such that mean = mean_low when x = -1, and mean = mean_high when x = +1. Equivalently, solves this set of equations for beta0 and beta1: mean_low = exp(beta0 - beta1) mean_high = exp(beta0 + beta1)

Usage

gen_poisson_anticoef(anticoef, mean_low, mean_high)

Arguments

anticoef

input anticipated coefficients of the proper length for your model matrix

mean_low

The low value of the mean value

mean_high

The high value of the mean value

Value

Anticipated coefficients.


Generate Block Panel

Description

Generate Block Panel

Usage

generate_block_panel(any_htc)

Arguments

any_htc

Factor number

Value

Shiny UI


Generate Factor Input Panel

Description

Generate Factor Input Panel

Usage

generate_factor_input_panel(factor_n = 1, factor_input_cache = NULL)

Arguments

factor_n

Factor number

Value

Shiny UI


Generate Noise Block

Description

Generates the noise to be added in the REML power calculation

Usage

generate_noise_block(noise, groups, blockstructure)

Value

Noise vector


Generate Optimality Results

Description

Generate Optimality Results

Usage

generate_optimality_results(any_htc)

Arguments

any_htc

Factor number

Value

Shiny UI


Generate Hypothesis Matrix

Description

Generates hypothesis matrix for power calculation

Usage

genhypmatrix(parameters, levels, g)

Arguments

parameters

Number of parameters total in model

levels

Number of levels in parameter of interest

g

Number of levels/parameters preceding parameter of interest

Value

The parameter matrix L isolating the levels of parameter of interest


Generate Parameter Matrix

Description

Generates the parameter matrix L to isolate the levels of interest in the calculation of power

Usage

genparammatrix(parameters, levels, g)

Arguments

parameters

Number of parameters total in model.

levels

Number of levels in parameter of interest

g

Number of levels/parameters preceeding parameter of interest

Value

The parameter vector Q isolating the levels of parameter of interest


Get attribute values

Description

Returns one or more of underlying attributes used in design generation/evaluation

Usage

get_attribute(output, attr = NULL, round = TRUE)

Arguments

output

The output of either 'gen_design()' or 'eval_design()'/'eval_design_mc()“

attr

Default 'NULL'. Return just the specific value requested. Potential values are 'model.matrix' for model used, 'moments.matrix', 'variance.matrix', 'alias.matrix', 'correlation.matrix', and 'model' for the model used in the evaluation/generation of the design.

round

Default 'TRUE'. Rounds off values smaller than the magnitude '1e-15“ in the 'correlation.matrix' and 'alias.matrix' matrix attributes.

Value

A list of attributes.

Examples

# We can extract the attributes of a design from either the output of `gen_design()`
# or the output of `eval_design()`

factorialcoffee = expand.grid(cost = c(1, 2),
                             type = as.factor(c("Kona", "Colombian", "Ethiopian", "Sumatra")),
                             size = as.factor(c("Short", "Grande", "Venti")))

designcoffee = gen_design(factorialcoffee, ~cost + size + type, trials = 29,
                         optimality = "D", repeats = 100)

#Extract a list of all attributes
get_attribute(designcoffee)

#Get just one attribute
get_attribute(designcoffee,"model.matrix")

# Extract from `eval_design()` output
power_output = eval_design(designcoffee, model = ~cost + size + type,
                          alpha = 0.05, detailedoutput = TRUE)

get_attribute(power_output,"correlation.matrix")

Calculate block structure lengths

Description

Calculate block structure lengths

Usage

get_block_groups(existing_block_structure)

Arguments

existing_block_structure

character matrix from rownames, split by '.'

Value

List of numbers indicating split plot layer sizes


Gets or calculates the moment matrix, given a candidate set, a design, and a model

Description

Returns number of levels prior to each parameter

Usage

get_moment_matrix(
  design = NA,
  factors = NA,
  classvector = NA,
  candidate_set_normalized = NA,
  model = ~.,
  moment_sample_density = 20,
  high_resolution_candidate_set = NA
)

Value

Returns a vector consisting of the number of levels preceeding each parameter (including the intercept)


Get optimality values

Description

Returns a list of optimality values (or one value in particular).

Note: The choice of contrast will effect the 'G' efficiency value, and 'gen_design()' and 'eval_design()' by default set different contrasts ('contr.simplex' vs 'contr.sum').

Usage

get_optimality(output, optimality = NULL, calc_g = FALSE)

Arguments

output

The output of either gen_design or eval_design/eval_design_mc.

optimality

Default 'NULL'. Return just the specific optimality requested.

calc_g

Default 'FALSE'. Whether to calculate the g-efficiency.

Value

A dataframe of optimality conditions. 'D', 'A', and 'G' are efficiencies (value is out of 100). 'T' is the trace of the information matrix, 'E' is the minimum eigenvalue of the information matrix, 'I' is the average prediction variance, and 'Alias' is the trace of the alias matrix.

Examples

# We can extract the optimality of a design from either the output of `gen_design()`
# or the output of `eval_design()`

factorialcoffee = expand.grid(cost = c(1, 2),
                             type = as.factor(c("Kona", "Colombian", "Ethiopian", "Sumatra")),
                             size = as.factor(c("Short", "Grande", "Venti")))

designcoffee = gen_design(factorialcoffee, ~cost + size + type, trials = 29,
                         optimality = "D", repeats = 100)

#Extract a list of all attributes
get_optimality(designcoffee)

#Get just one attribute
get_optimality(designcoffee,"D")

# Extract from `eval_design()` output
power_output = eval_design(designcoffee, model = ~cost + size + type,
                          alpha = 0.05, detailedoutput = TRUE)

get_optimality(power_output)

Get Power Curve Warnings and Errors

Description

Gets the warnings and errors from 'calculate_power_curves()' output.

Usage

get_power_curve_output(power_curve)

Arguments

power_curve

The output from 'calculate_power_curves()'

Value

A list of data.frames containing warning/error information

Examples

#Generate sample
if(skpr:::run_documentation()) {
calculate_power_curves(trials=seq(50,150,by=20),
                      candidateset = expand.grid(x=c(-1,1),y=c(-1,1)),
                      model = ~.,
                      effectsize = list(c(0.5,0.9),c(0.6,0.9)),
                      eval_function = eval_design_mc,
                      eval_args = list(nsim = 100, glmfamily = "binomial"))
}

Interpolate points within a convex hull

Description

Interpolate points within a convex hull

Usage

interpolate_convex_hull(points, ch_halfspace, n_samples_per_dimension = 11)

Arguments

points

A numeric matrix (n x d).

ch_halfspace

Convex hull. Output of 'convhull_halfspace'

n_samples_per_dimension

Number of samples per dimension (pre filtering) to take

Value

A list with: - data: The interpolated points - on_edge: A logical vector indicating whether the points


Layer Interaction

Description

Determines if a factor is a intra-layer interaction

Usage

is_intralayer_interaction(design, model, split_layers)

Arguments

design

The design matrix

model

The model

split_layers

The layer of split plots for each main effect term

Value

List of booleans for each subplot strata layer


Determines if rendering in knitr

Description

Determines if rendering in knitr

Usage

is_rendering_in_knitr()

Value

boolean


Normalize Design

Description

Normalizes the numeric columns in the design to -1 to 1. This is important to do if your model has interaction or polynomial terms, as these terms can introduce multi-collinearity and standardizing the numeric columns can reduce this problem.

Normalizes the numeric columns in the design to -1 to 1. This is important to do if your model has interaction or polynomial terms, as these terms can introduce multi-collinearity and standardizing the numeric columns can reduce this problem.

Usage

normalize_design(design, augmented = NULL)

normalize_design(design, augmented = NULL)

Arguments

design

The design matrix.

augmented

Default 'NULL'. If

Value

Normalized design matrix

Normalized run matrix

Examples

#Normalize a design
if(skpr:::run_documentation()) {
cand_set = expand.grid(temp = c(100,300,500),
                      altitude = c(10000,20000),
                      offset = seq(-10,-5,by=1),
                      type = c("A","B", "C"))
design = gen_design(cand_set, ~., 24)

#Un-normalized design
design
}
if(skpr:::run_documentation()) {
#Normalized design
normalize_design(design)
}

Calculates parameter power

Description

Calculates parameter power

Usage

parameterpower(
  RunMatrix,
  levelvector = NULL,
  anticoef,
  alpha,
  vinv = NULL,
  degrees = NULL,
  parameter_names
)

Arguments

RunMatrix

The run matrix

levelvector

The number of levels in each parameter (1st is always the intercept)

anticoef

The anticipated coefficients

alpha

the specified type-I error

vinv

The V inverse matrix

Value

The parameter power for the parameters


Permutations

Description

Return permutations

Usage

permutations(n)

Arguments

n

Number of elements to permute

Value

Matrix of permuted element ids


Plots design diagnostics

Description

Plots design diagnostics

Usage

plot_correlations(
  genoutput,
  model = NULL,
  customcolors = NULL,
  pow = 2,
  custompar = NULL,
  standardize = TRUE,
  plot = TRUE
)

Arguments

genoutput

The output of either gen_design or eval_design/eval_design_mc

model

Default 'NULL'. Defaults to the model used in generating/evaluating the design, augmented with 2-factor interactions. If specified, it will override the default model used to generate/evaluate the design.

customcolors

A vector of colors for customizing the appearance of the colormap

pow

Default 2. The interaction level that the correlation map is showing.

custompar

Default NULL. Custom parameters to pass to the 'par' function for base R plotting.

standardize

Default 'TRUE'. Whether to standardize (scale to -1 and 1 and center) the continuous numeric columns. Not standardizing the numeric columns can increase multi-collinearity (predictors that are correlated with other predictors in the model).

plot

Default 'TRUE'. If 'FALSE', this will return the correlation matrix.

Value

Silently returns the correlation matrix with the proper row and column names.

Examples

#We can pass either the output of gen_design or eval_design to plot_correlations
#in order to obtain the correlation map. Passing the output of eval_design is useful
#if you want to plot the correlation map from an externally generated design.

#First generate the design:

candidatelist = expand.grid(cost = c(15000, 20000), year = c("2001", "2002", "2003", "2004"),
                           type = c("SUV", "Sedan", "Hybrid"))
cardesign = gen_design(candidatelist, ~(cost+type+year)^2, 30)
plot_correlations(cardesign)

#We can also increase the level of interactions that are shown by default.

plot_correlations(cardesign, pow = 3)

#You can also pass in a custom color map.
plot_correlations(cardesign, customcolors = c("blue", "grey", "red"))
plot_correlations(cardesign, customcolors = c("blue", "green", "yellow", "orange", "red"))

Fraction of Design Space Plot

Description

Creates a fraction of design space plot

Usage

plot_fds(
  genoutput,
  model = NULL,
  continuouslength = 1001,
  plot = TRUE,
  sample_size = 10000,
  yaxis_max = NULL,
  description = "Fraction of Design Space"
)

Arguments

genoutput

The design, or the output of the power evaluation functions. This can also be a list of several designs, which will result in all of them being plotted in a row (for easy comparison).

model

Default 'NULL'. The model, if 'NULL' it defaults to the model used in 'eval_design' or 'gen_design'.

continuouslength

Default '11'. The precision of the continuous variables. Decrease for faster (but less precise) plotting.

plot

Default 'TRUE'. Whether to plot the FDS, or just calculate the cumulative distribution function.

sample_size

Default '10000'. Number of samples to take of the design space.

yaxis_max

Default 'NULL'. Manually set the maximum value of the prediction variance.

description

Default 'Fraction of Design Space'. The description to add to the plot. If a vector and multiple designs passed to genoutput, it will be the description for each plot.

Value

Plots design diagnostics, and invisibly returns the vector of values representing the fraction of design space plot. If multiple designs are passed, this will return a list of all FDS vectors.

Examples

#We can pass either the output of gen_design or eval_design to plot_correlations
#in order to obtain the correlation map. Passing the output of eval_design is useful
#if you want to plot the correlation map from an externally generated design.

#First generate the design:

candidatelist = expand.grid(X1 = c(1, -1), X2 = c(1, -1))

design = gen_design(candidatelist, ~(X1 + X2), 15)

plot_fds(design)

Find potential permuted interactions

Description

Returns permuted interactions

Usage

potential_permuted_factors(x)

Arguments

x

character vector of interaction terms

Value

character vector of potential interaction terms


Print evaluation information

Description

Prints design evaluation information below the data.frame of power values

Note: If options("skpr.ANSI") is 'NULL' or 'TRUE', ANSI codes will be used during printing to prettify the output. If this is 'FALSE', only ASCII will be used.

Usage

## S3 method for class 'skpr_eval_output'
print(x, ...)

Arguments

x

The x of the evaluation functions in skpr

...

Additional arguments.

Examples

#Generate/evaluate a design and print its information
factorialcoffee = expand.grid(cost = c(1, 2),
                              type = as.factor(c("Kona", "Colombian", "Ethiopian", "Sumatra")),
                              size = as.factor(c("Short", "Grande", "Venti")))

designcoffee = gen_design(factorialcoffee,
                         ~cost + size + type, trials = 29, optimality = "D", repeats = 100)

eval_design(designcoffee)

Print evaluation information

Description

Prints design evaluation information below the data.frame of power values

Note: If options("skpr.ANSI") is 'NULL' or 'TRUE', ANSI codes will be used during printing to prettify the output. If this is 'FALSE', only ASCII will be used.

Usage

## S3 method for class 'skpr_power_curve_output'
print(x, ...)

Arguments

x

The x of the evaluation functions in skpr

...

Additional arguments.

Examples

#Generate/evaluate a design and print its information
factorialcoffee = expand.grid(cost = c(1, 2),
                              type = as.factor(c("Kona",
                                                 "Colombian",
                                                 "Ethiopian",
                                                 "Sumatra")),
                              size = as.factor(c("Short",
                                                 "Grande",
                                                 "Venti")))

coffee_curves = calculate_power_curves(candidateset = factorialcoffee,
                                      model = ~(cost + size + type)^2,
                                      trials = 30:40, plot_results = FALSE)
coffee_curves

Prior levels

Description

Returns number of levels prior to each parameter

Usage

priorlevels(levelvector)

Value

Returns a vector consisting of the number of levels preceeding each parameter (including the intercept)


quadratic

Description

quadratic

Usage

quad(formula)

Arguments

formula

The formula to be expanded

Value

Returns quadratic model from formula


Rearrange formula by order

Description

Rearrange higher order arithmatic terms to end of formula

Usage

rearrange_formula_by_order(model, data)

Arguments

model

Base model

Value

Rearranged model


Remove columns not in model

Description

Remove columns not in model

Usage

reduceRunMatrix(RunMatrix, model, first_run = TRUE)

Value

The reduced model matrix.


Remove skpr-generated blocking columns

Description

Remove skpr-generated REML blocking columns if present

Usage

remove_skpr_blockcols(RunMatrix)

Arguments

RunMatrix

The run matrix

Value

Run Matrix


Run Documentation

Description

Run Documentation

Usage

run_documentation()

Value

bool


Set up progressr handler

Description

Set up progressr handler

Usage

set_up_progressr_handler(msg_string, type_string)

Graphical User Interface for skpr

Description

skprGUI provides a graphical user interface to skpr, within R Studio.

Usage

skprGUI(
  browser = FALSE,
  return_app = FALSE,
  multiuser = FALSE,
  progress = TRUE
)

Arguments

browser

Default 'FALSE'. Whether to open the application in an external browser.

return_app

Default 'FALSE'. If 'TRUE', this will return the shinyApp object.

multiuser

Default 'FALSE'. If 'TRUE', this will turn off and disable multicore functionality and enable non-blocking operation.

progress

Default 'TRUE'. Whether to include a progress bar in the application. Note: if 'multiuser = TRUE', progress bars are turned on by default.

Examples

#Type `skprGUI()` to begin