Type: | Package |
Title: | Judd, McClelland, & Ryan Formatting for ANOVA Output |
Version: | 3.0.0 |
Date: | 2024-02-06 |
Description: | Produces ANOVA tables in the format used by Judd, McClelland, and Ryan (2017, ISBN: 978-1138819832) in their introductory textbook, Data Analysis. This includes proportional reduction in error and formatting to improve ease the transition between the book and R. |
License: | GPL (≥ 3) |
URL: | https://github.com/UCLATALL/supernova |
BugReports: | https://github.com/UCLATALL/supernova/issues |
Depends: | R (≥ 3.4.0) |
Imports: | cli, methods, pillar (≥ 1.5.0), purrr, rlang, stringr, tibble, vctrs |
Suggests: | car, covr, dplyr (≥ 1.0.0), ggplot2, lintr, lme4, magrittr, readr, remotes, testthat (≥ 2.1.0), tidyr, vdiffr |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.1 |
NeedsCompilation: | no |
Packaged: | 2024-02-07 02:13:20 UTC; adamblake |
Author: | Adam Blake |
Maintainer: | Adam Blake <adam@coursekata.org> |
Repository: | CRAN |
Date/Publication: | 2024-02-07 11:40:02 UTC |
ANOVA table with nicer column names.
Description
ANOVA table with nicer column names.
Usage
anova_tbl(model)
Arguments
model |
Value
An ANOVA table with standard column names.
Plotting method for pairwise objects.
Description
Plotting method for pairwise objects.
Usage
autoplot.pairwise(object, ...)
## S3 method for class 'pairwise'
plot(x, y, ...)
Arguments
object |
A |
... |
Additional arguments passed to the plotting geom. |
x |
A |
y |
Ignored, required for compatibility with the |
Details
This function requires an optional dependency: ggplot2
.
When this package is installed, calling autoplot()
or plot
on a pairwise
object will
generate a plot of the pairwise comparisons. The plot will show the differences between the
groups, with error bars representing the confidence intervals. The x-axis will be labeled with
the type of confidence interval used and the values of the differences, and the y-axis will be
labeled with the groups being compared. A dashed line at 0 is included to help visualize the
differences.
Examples
if (require(ggplot2)) {
# generate the plot immediately
pairwise(lm(mpg ~ factor(am) + disp, data = mtcars), plot = TRUE)
# or save the object and plot it later
p <- pairwise(lm(mpg ~ factor(am) + disp, data = mtcars))
plot(p)
}
Paste, Concatenate, add End-Of-Line and Print
Description
Paste, Concatenate, add End-Of-Line and Print
Usage
cat_line(...)
Arguments
... |
Character vectors to paste together. |
Value
None (invisible NULL
).
Check that the arguments are compatible with the rest of the pairwise code.
Description
Check that the arguments are compatible with the rest of the pairwise code.
Usage
check_pairwise_args(fit, alpha)
check_aov_compat(fit)
check_not_empty(fit)
Arguments
fit |
|
alpha |
A single double value indicating the alpha to use for the tests. |
Functions
-
check_aov_compat()
: Ensure the model can be converted byaov()
-
check_not_empty()
: Check that the model is not the empty model
Drop a term from the given model
Description
This function is needed to re-fit the models for Type III SS. If you have a model with an
interactive term (e.g. y ~ a + b + a:b
), when you try to refit without one of the lower-order
terms (e.g. y ~ a + a:b
) lm()
will add it back in. This function uses a fitting function
that operates underneath lm()
to circumvent this behavior. (It is very similar to drop1()
.)
Usage
drop_term(fit, term)
Arguments
fit |
The model to update. |
term |
The term to drop from the model. |
Value
An object of the class lm
.
Print the output of lm()
with the fitted equation.
Description
Print the output of lm()
with the fitted equation.
Usage
equation(x, digits = max(3L, getOption("digits") - 3L))
Arguments
x |
The fitted linear model to print. |
digits |
The minimal number of significant digits. |
Value
Invisibly return the fitted linear model.
Find the categorical variables in a model
Description
Find the categorical variables in a model
Usage
find_categorical_vars(fit)
Arguments
fit |
Value
A character vector of the categorical variables in the model. Note these are not terms, they are variables, e.g. interactions are not included here, only the variables they are comprised of.
We have to insert spaces where terms were removed from the part model.
Description
We have to insert spaces where terms were removed from the part model.
Usage
formula_string(obj, part, term)
Build a formula from terms
Description
Build a formula from terms
Usage
frm_build(lhs, rhs, env = parent.frame())
Arguments
lhs |
The outcome term for the left-hand side. |
rhs |
The terms for the right-hand side. |
env |
The environment to assign to the formula (defaults to calling environment). |
Value
The right-hand side terms are joined with +
. Then, the right-hand side is joined
to the left and returned as a formula.
See Also
formula_extraction formula_expansion
Expand a formula
Description
Expand a formula
Usage
frm_expand(frm)
Arguments
frm |
A formula that may have compact terms like |
Value
The expanded formula where terms like a * b
are expanded to a + b + a:b
.
See Also
formula_building formula_extraction
Extracting from formulae
Description
These tools extracting parts from formulae. The only function that extracts from the left-hand
side is frm_outcome
. The rest only extract from the right-hand side. The word term
is used to
denote functions that extract full terms from the formula, whereas var
denotes functions that
extract the variables the formula uses. For example, the formula y ~ a * b + (1 | group)
has
terms a
, b
, a:b
, and 1 | group
. The same formula has variables a
, b
, and group
.
Usage
frm_outcome(frm)
frm_terms(frm)
frm_interaction_terms(frm)
frm_fixed_terms(frm)
frm_random_terms(frm)
frm_vars(frm)
frm_random_vars(frm)
frm_fixed_vars(frm)
Arguments
frm |
The formula to extract values from |
Details
These tools are ONLY tested against models and formulae that are explicitly supported. See the README and test cases for more information.
Value
The function name and parameters should be descriptive enough (see Description above). The extracted parts are always strings.
See Also
formula_building formula_expansion
Remove a term or variable from the right-hand side of a formula
Description
Remove a term or variable from the right-hand side of a formula
Usage
frm_remove_term(frm, term)
frm_remove_var(frm, var)
Arguments
frm |
The formula to modify. |
term , var |
The term or variable to drop. |
Value
The formula with the term removed.
See Also
formula_building formula_expansion formula_extraction
Get the string representation of the formula.
Description
Get the string representation of the formula.
Usage
frm_string(frm)
Arguments
frm |
The formula (or something that can be coerced to a formula). |
Value
A character string of the formula.
Generate a List of Models for Computing Different Types of Sums of Squares
Description
This function will return a list of lists where the top-level keys (names) of the items indicate
the component of the full model (i.e. the term) that the generated models can be used to test. At
each of these keys is a list with both the complex
and simple models
that can be compared to
test the component. The complex
models always include the target term, and the simple
models
are identical to the complex
except the target term is removed. Thus, when the models are
compared (e.g. using anova
, except for Type III; see details below), the resulting values
will show the effect of adding the target term to the model. There are three generally used
approaches to determining what the appropriate comparison models should be, called Type I, II,
and III. See the sections below for more information on these types.
Usage
generate_models(model, type = 3)
## S3 method for class 'formula'
generate_models(model, type = 3)
## S3 method for class 'lm'
generate_models(model, type = 3)
Arguments
model |
The model to generate the models from, of the type |
type |
The type of sums of squares to calculate: - Use |
Value
A list of the augmented models for each term, where the associated term is the key for each model in the list.
Type I
For Type I SS, or sequential SS, each term is considered in order after the preceding terms are considered. Consider the example model
Y ~ A + B + A:B
, where ":" indicates an interaction. To determine the Type I effect of A
, we would compare the
model Y ~ A
to the same model without the term: Y ~ NULL
. For B
, we compare Y ~ A + B
to
Y ~ A
; and for A:B
, we compare Y ~ A + B + A:B
to Y ~ A + B
. Incidentally, the anova()
function that ships with the base installation of R computes Type I statistics.
Type II
For Type II SS, or hierarchical SS, each term is considered in the presence of all of the terms that do not include it. For example, consider an example three-way factorial model
Y ~ A + B + C + A:B + A:C + B:C + A:B:C
, where ":" indicates an interaction. The effect of A
is found by comparing Y ~ B + C + B:C + A
to Y ~ B + C + B:C
(the only terms included are those that do not include A
). For B
, the
comparison models would be Y ~ A + C + A:C + B
and Y ~ A + C + A:C
; for A:B
, the models
would be Y ~ A + B + C + A:C + B:C + A:B
and Y ~ A + B + C + A:C + B:C
; and so on.
Type III
For Type III SS, or orthogonal SS, each term is considered in the presence of all of the other terms. For example, consider an example two-way factorial model
Y ~ A + B + A:B
, where :
indicates an interaction between the terms. The effect of A
, is found by comparing
Y ~ B + A:B + A
to Y ~ B + A:B
; for B
, the comparison models would be Y ~ A + A:B + B
and
Y ~ A + A:B
; and for A:B
, the models would be Y ~ A + B + A:B
and Y ~ A + B
.
Unfortunately, anova()
cannot be used to compare Type III models. anova()
does not allow for
violation of the principle of marginality, which is the rule that interactions should only be
tested in the context of their lower order terms. When an interaction term is present in a model,
anova()
will automatically add in the lower-order terms, making a model like Y ~ A + A:B
unable to be compared: it will add the lower-order term B
,and thus use the model Y ~ A + B + A:B
instead. To get the appropriate statistics for Type III comparisons, use drop1()
with the
full scope, i.e. drop1(model_fit, scope = . ~ .)
.
Examples
# create all type 2 comparison models
models <- generate_models(
lm(mpg ~ hp * factor(am), data = mtcars),
type = 2
)
# compute the SS for the hp term
anova_hp <- anova(models$hp$simple, models$hp$complex)
anova_hp[["Sum of Sq"]][[2]]
Insert a row of data into a table.
Description
Insert a row of data into a table.
Usage
insert_row(df, insert_at, contents)
Arguments
df |
The original data.frame. |
insert_at |
The row in which to insert the data. |
contents |
The row of contents to insert (should be a vector of length
|
Value
The original data.frame with the row of data inserted.
Insert a horizontal rule in a table for pretty printing
Description
Insert a horizontal rule in a table for pretty printing
Usage
insert_rule(df, insert_at)
Arguments
df |
The original data.frame |
insert_at |
The row in which to insert the dashes. |
Value
The original data.frame with the horizontal rule inserted.
Get all pairs for a given vector
Description
The output of this function should match the pairs you get when you run TukeyHSD
.
Usage
level_pairs(levels)
Arguments
levels |
The vector to get pairs for. It is called levels because it was written for the purpose of comparing levels of a factor to one another with multiple comparisons. |
Value
A tibble
with two columns, group 1 and group 2, where each row is a unique pair.
Remove cases with missing values.
Description
Remove cases with missing values.
Usage
listwise_delete(obj, vars)
## S3 method for class 'data.frame'
listwise_delete(obj, vars = names(obj))
## S3 method for class 'lm'
listwise_delete(obj, vars = all.vars(formula(obj)))
Arguments
obj |
The |
vars |
The variables to consider. |
Value
For data.frame
s, the vars
are checked for missing values. If one is found on any of
the variables, the entire row is removed (list-wise deletion). For linear models, the model is
refit after the underlying data have been processed.
Find and return the lower triangle of a matrix
Description
Same as lower.tri()
except it returns the values from the matrix (rather than a positional
matrix that lets you look up the values).
Usage
lower_tri(x, diag = FALSE)
Arguments
x |
a matrix or other R object with |
diag |
logical. Should the diagonal be included? |
Value
The values in the lower triangular part of the matrix.
Get the means and counts for each categorical term in the model
Description
Get the means and counts for each categorical term in the model
Usage
means_and_counts(fit, term)
Arguments
fit |
|
term |
If |
Value
A list of the means and counts for each level of each term.
Constructor for pairwise comparison tables
Description
Constructor for pairwise comparison tables
Usage
new_pairwise_tbl(tbl, term, fit, fwer, alpha, correction)
Arguments
tbl |
A |
term |
The term the table describes. |
fit |
The linear model the term comes from. |
fwer |
The family-wise error-rate for the group of tests in the table. |
alpha |
The alpha to use when computing the family-wise error-rate. |
correction |
The type of alpha correction the tests in the table use. |
Value
A tibble
sub-classed as pairwise_comparison_tbl
. These have custom printers and
retain their attributes when subsetted.
number
vector
Description
This creates a formatted double vector. You can specify the number of digits you want the value to display after the decimal, and the underlying value will not change. Additionally you can explicitly set whether scientific notation should be used and if numbers less than 0 should contain a leading 0.
Usage
number(x = numeric(), digits = 3L, scientific = FALSE, leading_zero = TRUE)
is_number(x)
as_number(x)
Arguments
x |
|
digits |
The number of digits to display after the decimal point. |
scientific |
Whether the number should be represented with scientific notation (e.g. 1e2) |
leading_zero |
Whether a leading zero should be used on numbers less than 0 (e.g. .001) |
Value
An S3 vector of class supernova_number
. It should behave like a double, but be
formatted consistently.
Examples
number(1:5, digits = 3)
Pad x to length of y
Description
Pad x to length of y
Usage
pad(x, y, after = length(x), pad = NA)
Arguments
x |
The vector to pad. |
y |
The vector with target length. |
after |
A subscript, after which the padding is to be appended. |
pad |
The value to pad the vector with. |
Value
The padded vector.
Pad x to a given output length
Description
Pad x to a given output length
Usage
pad_len(x, output_length, after = length(x), pad = NA)
Arguments
x |
The vector to pad. |
output_length |
The length to pad the vector to. |
after |
A subscript, after which the padding is to be appended. |
pad |
The value to pad the vector with. |
Value
The padded vector.
Compute all pairwise comparisons between category levels
Description
This function is useful for generating and testing all pairwise comparisons of categorical terms
in a linear model. This can be done in base R using functions like pairwise.t.test
and
TukeyHSD
, but these functions are inconsistent both in their output format and their general
approach to pairwise comparisons. pairwise()
will return a consistent table format, and will
make consistent decisions about how to calculate error terms and confidence intervals. See the
Details section low for more on how the models are tested (and why your output might not
match other functions).
Usage
pairwise(
fit,
correction = "Tukey",
term = NULL,
alpha = 0.05,
var_equal = TRUE,
plot = FALSE
)
pairwise_t(fit, term = NULL, alpha = 0.05, correction = "none")
pairwise_bonferroni(fit, term = NULL, alpha = 0.05)
pairwise_tukey(fit, term = NULL, alpha = 0.05)
Arguments
fit |
|
correction |
The type of correction (if any) to perform to maintain the family-wise
error-rate specified by |
term |
If |
alpha |
The family-wise error-rate to restrict the tests to. If "none" is given for
|
var_equal |
If |
plot |
Setting plot to TRUE will automatically call |
Details
For simple one-way models where a single categorical variable predicts and outcome, you will get
output similar to other methods of computing pairwise comparisons. Essentially, the differences
on the outcome between each of the groups defined by the categorical variable are compared with
the requested test, and their confidence intervals and p-values are adjusted by the requested
correction
.
However, when more than two variables are entered into the model, the outcome will diverge somewhat from other methods of computing pairwise comparisons. For traditional pairwise tests you need to estimate an error term, usually by pooling the standard deviation of the groups being compared. This means that when you have other predictors in the model, their presence is ignored when running these tests. For the functions in this package, we instead compute the pooled standard error by using the mean squared error (MSE) from the full model fit.
Let's take a concrete example to explain that. If we are predicting a car's miles-per-gallon
(mpg
) based on whether it has an automatic or manual transmission (am
), we can create that
linear model and get the pairwise comparisons like this:
pairwise(lm(mpg ~ factor(am), data = mtcars))
The output of this code will have one table showing the comparison of manual and automatic transmissions with regard to miles-per-gallon. The pooled standard error is the same as the square root of the MSE from the full model.
In these data the am
variable did not have any other values than automatic and manual, but
we can imagine situations where the predictor has more than two levels. In these cases, the
pooled SD would be calculated by taking the MSE of the full model (not of each group) and then
weighting it based on the size of the groups in question (divide by n).
To improve our model, we might add the car's displacement (disp
) as a quantitative predictor:
pairwise(lm(mpg ~ factor(am) + disp, data = mtcars))
Note that the output still only has a table for am
. This is because we can't do a pairwise
comparison using disp
because there are no groups to compare. Most functions will drop or not
let you use this variable during pairwise comparisons. Instead, pairwise()
uses the same
approach as in the 3+ groups situation: we use the MSE for the full model and then weight it by
the size of the groups being compared. Because we are using the MSE for the full model, the
effect of disp
is accounted for in the error term even though we are not explicitly comparing
different displacements. Importantly, the interpretation of the outcome is different than in
other traditional t-tests. Instead of saying, "there is a difference in miles-per-gallon based
on the type of transmission," we must add that this difference is found "after accounting for
displacement."
Value
A list of tables organized by the terms in the model. For each term (categorical terms only, as splitting on a continuous variable is generally uninformative), the table describes all of the pairwise-comparisons possible.
Paste together lines of text.
Description
The lines are joined together with a newline (\n
) character.
Usage
paste_line(...)
Arguments
... |
Vectors of lines of text. |
Value
Check out the paste function for more information.
Refit a model, dropping any non-categorical terms.
Description
Refit a model, dropping any non-categorical terms.
Usage
refit_categorical(fit)
Arguments
fit |
Value
A linear model that only has categorical predictors.
Rename a column in a data frame
Description
Rename a column in a data frame
Usage
rename(data, col_name, replacement)
Arguments
data |
A data frame to modify. |
col_name |
A character vector of columns to rename. |
replacement |
A character vector of replacement column names. |
Value
Returns the renamed data frame.
Convert SS type parameter to the corresponding numeric value
Description
Convert SS type parameter to the corresponding numeric value
Usage
resolve_type(type)
Arguments
type |
The value to convert, either string or numeric. |
Value
The numeric value corresponding to the input.
A template for a row in an ANOVA table.
Description
A template for a row in an ANOVA table.
Usage
row_blank(
term = NA_character_,
description = NA_character_,
ss = NA_real_,
df = NA_integer_,
ms = ss/df,
f = NA_real_,
pre = NA_real_,
p = NA_real_
)
Arguments
term |
The name of the term the row describes. |
description |
An optional, short description of the term (pedagogical). |
ss |
The sum of squares for the term (defaults to blank) |
df |
The degrees of freedom the term uses (defaults to blank). |
ms |
The mean square for the term (defaults to |
f |
Fisher's F statistic for the term in the model (defaults to blank). |
pre |
The proportional reduction of error the term provides (defaults to blank). |
p |
The p-value of the F (and PRE) for the term in the model (defaults to blank). |
Value
A tibble_row
of length 1 with all of the variables initialized.
Compute and construct an ANOVA table row for an error term
Description
Compute and construct an ANOVA table row for an error term
Usage
row_error(term, description, fit)
Arguments
term |
The name of the term the row describes (e.g. Error or Total). |
description |
An optional, short description of the term (pedagogical). |
fit |
The model we are describing error from. |
Value
A tibble_row
with the properties initialized. The code has been written to be as
simple and understandable as possible. Please take a look at the source and offer any
suggestions for improvement!
Compute and construct an ANOVA table row for a term.
Description
"Term" is loosely defined here and is probably better understood as "everything in the table that is not an error row.
Usage
row_term(name, description, models, term)
Arguments
description |
An optional, short description of the term (pedagogical). |
models |
The models created by |
term |
The term to compute the row for. |
Value
A tibble_row
with the properties initialized. The code has been written to be as
simple and understanding as possible. Please take a look at the source and offer any
suggestions for improvement!
Select terms based on the user's term
specification
Description
Before returning the selection, ensure that the term we are subsetting on exists.
Usage
select_terms(fit, term = NULL)
Arguments
fit |
|
term |
If |
Value
A character vector of terms to run analyses on.
supernova
Description
An alternative set of summary statistics for ANOVA. Sums of squares, degrees of freedom, mean squares, and F value are all computed with Type III sums of squares, but for fully-between subjects designs you can set the type to I or II. This function adds to the output table the proportional reduction in error, an explicit summary of the whole model, separate formatting of p values, and is intended to match the output used in Judd, McClelland, and Ryan (2017).
Usage
supernova(fit, type = 3, verbose = TRUE)
## S3 method for class 'lm'
supernova(fit, type = 3, verbose = TRUE)
## S3 method for class 'lmerMod'
supernova(fit, type = 3, verbose = FALSE)
Arguments
fit |
A model fit by |
type |
The type of sums of squares to calculate (see |
verbose |
If |
Value
An object of the class supernova
, which has a clean print method for displaying the
ANOVA table in the console as well as a named list:
tbl |
The ANOVA table as a |
fit |
|
models |
Models created by |
References
Judd, C. M., McClelland, G. H., & Ryan, C. S. (2017). Data Analysis: A Model Comparison Approach to Regression, ANOVA, and Beyond (3rd ed.). New York: Routledge. ISBN:879-1138819832
Examples
supernova(lm(mpg ~ disp, data = mtcars))
change_p_decimals <- supernova(lm(mpg ~ disp, data = mtcars))
print(change_p_decimals, pcut = 8)
Update a model in the environment the model was created in
Description
stats::update()
will perform the update in parent.frame()
by default, but this can cause
problems when the update is called by another function (so the parent frame is no longer the
environment the user is in).
Usage
update_in_env(object, formula., ...)
Arguments
object |
An existing fit from a model function such as |
formula. |
Changes to the formula – see |
... |
Additional arguments to the call, or arguments with
changed values. Use |
Value
The updated model is returned.
Extract the variables from a model formula
Description
Extract the variables from a model formula
Usage
variables(object)
## S3 method for class 'supernova'
variables(object)
## S3 method for class 'formula'
variables(object)
## S3 method for class 'lm'
variables(object)
## S3 method for class 'lmerMod'
variables(object)
Arguments
object |
Value
A list containing the outcome
and predictor
variables in the model.