Type: | Package |
Title: | High-Level Functions for Tabulating, Charting and Reporting Survey Data |
Version: | 3.1.0 |
Date: | 2025-04-05 |
Description: | Craft polished tables and plots in Markdown reports. Simply choose whether to treat your data as counts or metrics, and the package will automatically generate well-designed default tables and plots for you. Boiled down to the basics, with labeling features and simple interactive reports. All functions are 'tidyverse' compatible. |
URL: | https://github.com/strohne/volker, https://strohne.github.io/volker/ |
BugReports: | https://github.com/strohne/volker/issues |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
LazyData: | true |
Imports: | stats, utils, rlang, lifecycle, tibble, dplyr, tidyr, tidyselect, ggplot2 (≥ 2.2.1), scales, base64enc, purrr, magrittr, skimr, broom, knitr, kableExtra, rmarkdown, psych, car, effectsize, heplots |
Depends: | R (≥ 4.2) |
Suggests: | tidyverse, remotes, usethis, testthat (≥ 3.0.0) |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-04-05 20:30:45 UTC; Jakob |
Author: | Jakob Jünger |
Maintainer: | Jakob Jünger <jakob.juenger@uni-muenster.de> |
Repository: | CRAN |
Date/Publication: | 2025-04-05 20:50:02 UTC |
volker: High-Level Functions for Tabulating, Charting and Reporting Survey Data
Description
Craft polished tables and plots in Markdown reports. Simply choose whether to treat your data as counts or metrics, and the package will automatically generate well-designed default tables and plots for you. Boiled down to the basics, with labeling features and simple interactive reports. All functions are 'tidyverse' compatible.
Author(s)
Maintainer: Jakob Jünger jakob.juenger@uni-muenster.de (ORCID) [copyright holder]
Authors:
Henrieke Kotthoff henrieke.kotthoff@uni-muenster.de [contributor]
Other contributors:
Chantal Gärtner chantal.gaertner@uni-muenster.de (ORCID) [contributor]
See Also
Useful links:
Report bugs at https://github.com/strohne/volker/issues
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Arguments
lhs |
A value or the magrittr placeholder. |
rhs |
A function call using the magrittr semantics. |
Value
The result of calling rhs(lhs)
.
Add an object to the report list
Description
Add an object to the report list
Usage
.add_to_vlkr_rprt(obj, chunks, tab = NULL)
Arguments
obj |
A new chunk (volker table, volker plot or character value). |
chunks |
The current report list. |
tab |
A tabsheet name or NULL. |
Value
A volker report object.
Insert a name-value-pair into an object attribute
Description
Insert a name-value-pair into an object attribute
Usage
.attr_insert(obj, key, name, value)
Arguments
obj |
The object. |
key |
The attribute key. |
name |
The name of a list item within the attribute. |
value |
The value of the list item. |
Value
The object with new attributes.
Transfer attributes from one to another object
Description
Transfer attributes from one to another object
Usage
.attr_transfer(to, from, keys)
Arguments
to |
The target object. |
from |
The source object. |
keys |
A character vector of attribute keys |
Value
The target object with the updated attributes.
Get the maximum density value in a density plot
Description
Useful for placing geoms in the center of density plots
Usage
.density_mode(data, col)
Arguments
data |
A tibble. |
col |
A tidyselect column. |
Value
The maximum density value.
Test whether correlations are different from zero
Description
Test whether correlations are different from zero
Usage
.effect_correlations(data, cols, cross, method = "pearson", labels = TRUE)
Arguments
data |
A tibble. |
cols |
The columns holding metric values. |
cross |
The columns holding metric values to correlate. |
method |
The output metrics, pearson = Pearson's R, spearman = Spearman's rho. The reported R square value is just squared Spearman's or Pearson's R. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
Value
A tibble with correlation results.
Calculate nmpi
Description
Calculate nmpi
Usage
.effect_npmi(data, col, cross, labels = TRUE, clean = TRUE, smoothing = 0, ...)
Arguments
data |
A tibble. |
col |
The column holding factor values. |
cross |
The column to correlate. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
smoothing |
Add pseudocount. Calculate the pseudocount based on the number of trials to apply Laplace's rule of succession. |
... |
Placeholder to allow calling the method with unused parameters from tab_counts. |
Value
A volker tibble.
Create a factor vector and preserve all attributes
Description
Create a factor vector and preserve all attributes
Usage
.factor_with_attr(x, levels = NULL)
Arguments
x |
The source value, usually a character vector |
levels |
The new levels |
Value
A factor vector with the new levels
Get plot size and resolution for the current output format from the config
Description
Get plot size and resolution for the current output format from the config
Usage
.get_fig_settings()
Value
A list with figure settings
Calculate IQR
Description
Calculate IQR
Usage
.iqr(x)
Arguments
x |
A numeric vector |
Value
The IQR
Knit volker plots
Description
Automatically calculates the plot height from chunk options and volker options.
Usage
.knit_plot(pl)
Arguments
pl |
A ggplot object with vlkr_options. The vlk_options are added by .to_vlkr_plot() and provide information about the number of vertical items (rows) and the maximum. |
Details
Presumptions:
a screen resolution of 72dpi
a default plot width of 7 inches = 504px
a default page width of 700px (vignette) or 910px (report)
an optimal bar height of 40px for 910px wide plots. i.e. a ratio of 0.04
an offset of one bar above and one bar below
Value
Character string containing a html image tag, including the base64 encoded image.
Prepare markdown content for table rendering
Description
Prepare markdown content for table rendering
Usage
.knit_prepare(x, wrap = FALSE)
Arguments
x |
Markdown text. |
wrap |
Wrap text after the given number of characters. |
Value
Markdown text with line breaks and escaped special characters.
Knit volker tables
Description
Knit volker tables
Usage
.knit_table(df, ...)
Arguments
df |
Data frame. |
Value
Formatted table produced by kable.
Calculate outliers
Description
Calculate outliers
Usage
.outliers(x, k = 1.5)
Arguments
x |
A numeric vector. |
Value
A list of outliers.
Helper function: plot grouped bar chart
Description
Helper function: plot grouped bar chart
Usage
.plot_bars(
data,
category = NULL,
ci = FALSE,
scale = NULL,
limits = NULL,
numbers = NULL,
orientation = "horizontal",
base = NULL,
title = NULL
)
Arguments
data |
Data frame with the columns item, value, p, n and optionally p_item. If p_item is provided, the column width is generated according the p_item value, resulting in a mosaic plot. |
category |
Category for filtering the data frame. |
ci |
Whether to plot error bars for 95% confidence intervals. Provide the columns ci.low and ci.high in data. |
scale |
Direction of the scale: 0 = no direction for categories, -1 = descending or 1 = ascending values. |
numbers |
The values to print on the bars: "n" (frequency), "p" (percentage) or both. |
orientation |
Whether to show bars (horizontal) or columns (vertical) |
base |
The plot base as character or NULL. |
title |
The plot title as character or NULL. |
Value
A ggplot object.
Helper function: plot cor and regression outputs
Description
Helper function: plot cor and regression outputs
Usage
.plot_cor(
data,
ci = TRUE,
base = NULL,
limits = NULL,
title = NULL,
label = NULL
)
Arguments
data |
Dataframe with the columns item and value. To plot errorbars, add the columns low and high and set the ci-paramater to TRUE. |
ci |
Whether to plot confidence intervals. Provide the columns low and high in data. |
base |
The plot base as character or NULL. |
limits |
The scale limits. |
title |
The plot title as character or NULL. |
label |
The y axis label. |
Value
A ggplot object.
Helper function: plot grouped line chart
Description
Helper function: plot grouped line chart
Usage
.plot_lines(data, scale = NULL, base = NULL, limits = NULL, title = NULL)
Arguments
data |
Dataframe with the columns item, value, and .cross |
scale |
Passed to the label scale function. |
base |
The plot base as character or NULL. |
limits |
The scale limits. |
title |
The plot title as character or NULL. |
Value
A ggplot object.
Helper function: scree plot
Description
Helper function: scree plot
Usage
.plot_scree(data, k = NULL, lab_x = NULL, lab_y = NULL)
Arguments
data |
Dataframe with the factor or cluster number in the first column and the metric in the second. |
k |
Provide one of the values in the first column to color points up to this value. |
lab_x |
Label of the x axis |
lab_y |
Label of the y axis |
Value
A vlkr_plot object
Helper function: plot grouped line chart by summarising values
Description
Helper function: plot grouped line chart by summarising values
Usage
.plot_summary(
data,
ci = FALSE,
scale = NULL,
base = NULL,
box = FALSE,
limits = NULL,
title = NULL
)
Arguments
data |
Dataframe with the columns item, value. |
ci |
Whether to plot confidence intervals of the means. |
scale |
Passed to the label scale function. |
base |
The plot base as character or NULL. |
box |
Whether to add boxplots. |
title |
The plot title as character or NULL. |
Value
A ggplot object.
Generate an cluster table and plot
Description
Generate an cluster table and plot
Usage
.report_cls(
data,
cols,
cross,
metric = FALSE,
...,
k = 2,
effect = FALSE,
title = TRUE
)
Arguments
data |
A data frame. |
cols |
A tidy column selection, e.g. a single column (without quotes) or multiple columns selected by methods such as starts_with(). |
cross |
Not yet implemented. Optional, a grouping column (without quotes). |
metric |
Not yet implemented. When crossing variables, the cross column parameter can contain categorical or metric values. By default, the cross column selection is treated as categorical data. Set metric to TRUE, to treat it as metric and calculate correlations. |
k |
Number of clusters to calculate. |
effect |
Not yet implemented. Whether to report statistical tests and effect sizes. |
title |
Add a plot title (default = TRUE). |
Value
A list containing a table and a plot volker report chunk.
Generate an factor table and plot
Description
Generate an factor table and plot
Usage
.report_fct(
data,
cols,
cross,
metric = FALSE,
...,
k = 2,
effect = FALSE,
title = TRUE
)
Arguments
data |
A data frame. |
cols |
A tidy column selection, e.g. a single column (without quotes) or multiple columns selected by methods such as starts_with(). |
cross |
Not yet implementedt. Optional, a grouping column (without quotes). |
metric |
Not yet implemented. When crossing variables, the cross column parameter can contain categorical or metric values. By default, the cross column selection is treated as categorical data. Set metric to TRUE, to treat it as metric and calculate correlations. |
k |
Number of factors to calculate. |
effect |
Not yet implemented. Whether to report statistical tests and effect sizes. |
title |
Add a plot title (default = TRUE). |
Value
A list containing a table and a plot volker report chunk.
Generate an index table and plot
Description
Generate an index table and plot
Usage
.report_idx(
data,
cols,
cross,
metric = FALSE,
...,
effect = FALSE,
title = TRUE
)
Arguments
data |
A data frame. |
cols |
A tidy column selection, e.g. a single column (without quotes) or multiple columns selected by methods such as starts_with(). |
cross |
Optional, a grouping column (without quotes). |
metric |
When crossing variables, the cross column parameter can contain categorical or metric values. By default, the cross column selection is treated as categorical data. Set metric to TRUE, to treat it as metric and calculate correlations. |
effect |
Whether to report statistical tests and effect sizes. |
title |
Add a plot title (default = TRUE). |
Value
A list containing a table and a plot volker report chunk.
Split a metric column into categories based on the median
Description
Split a metric column into categories based on the median
Usage
.tab_split(data, col, labels = TRUE)
Arguments
data |
A data frame containing the column to be split. |
col |
The column to split. |
labels |
Logical; if |
Value
A data frame with the specified column converted into categorical labels based on its median value. The split threshold (median) is stored as an attribute of the column.
Add vlkr_df class - that means, the data frame has been prepared
Description
Add vlkr_df class - that means, the data frame has been prepared
Usage
.to_vlkr_df(data, digits = NULL)
Arguments
data |
A tibble. |
Value
A tibble of class vlkr_df.
Add vlkr_list class
Description
Used to collect multiple tables in a list, e.g. from regression outputs
Usage
.to_vlkr_list(data, baseline = TRUE)
Arguments
data |
A list. |
baseline |
Whether to get the baseline. |
Value
A volker list.
Add the volker class and options
Description
Add the volker class and options
Usage
.to_vlkr_plot(
pl,
rows = NULL,
maxlab = NULL,
baseline = TRUE,
theme_options = TRUE
)
Arguments
pl |
A ggplot object. |
rows |
The number of items on the vertical axis. Will be automatically determined when NULL. For stacked bar charts, don't forget to set the group parameter, otherwise it won't work |
maxlab |
The character length of the longest label to be plotted. Will be automatically determined when NULL. on the vertical axis. |
baseline |
Whether to print a message about removed values. |
theme_options |
Enable or disable axis titles and text, by providing a list with any of the elements axis.text.x, axis.text.y, axis.title.x, axis.title.y set to TRUE or FALSE. By default, titles (=scale labels) are disabled and text (= the tick labels) are enabled. |
Value
A ggplot object with vlkr_plt class.
Add the vlkr_rprt class to an object
Description
Adding the class makes sure the appropriate printing function is applied in markdown reports.
Usage
.to_vlkr_rprt(chunks)
Arguments
chunks |
A list of character strings. |
Value
A volker report object: List of character strings with the vlkr_rprt class containing the parts of the report.
Add vlkr_tbl class
Description
Additionally, removes the skim_df class if present.
Usage
.to_vlkr_tab(data, digits = NULL, caption = NULL, baseline = NULL)
Arguments
data |
A tibble. |
digits |
Set the plot digits. If NULL (default), no digits are set. |
caption |
The caption printed above the table. |
baseline |
A base line printed below the table. |
Value
A volker tibble.
Calculate lower whisker in a boxplot
Description
Calculate lower whisker in a boxplot
Usage
.whisker_lower(x, k = 1.5)
Arguments
x |
A numeric vector. |
Value
The lower whisker value.
Calculate upper whisker in a boxplot
Description
Calculate upper whisker in a boxplot
Usage
.whisker_upper(x, k = 1.5)
Arguments
x |
A numeric vector. |
Value
The upper whisker value.
Resolution settings for plots
Description
Override with options(vlkr.fig.settings=list(html = list(dpi = 192, scale = 2, width = 910, pxperline = 15)))
.
Add a key for each output format when knitting a document.
You can override the width by setting vlkr.fig.width in the chunk options.
Usage
VLKR_FIG_SETTINGS
Format
An object of class list
of length 2.
Fill colors
Description
Override with options(vlkr.discrete.fill=list(c("purple")))
.
Usage
VLKR_FILLDISCRETE
Format
An object of class list
of length 3.
Gradient colors
Description
Override with options(vlkr.gradient.fill=list(c("white","black")))
.
Usage
VLKR_FILLGRADIENT
Format
An object of class character
of length 5.
Polarized colors
Description
Polarized colors
Usage
VLKR_FILLPOLARIZED
Format
An object of class character
of length 5.
Maximum number of distinct values to determine whether a column selection contains only categorical values
Description
Override with options(vlkr.max.categories=10)
.
Usage
VLKR_MAX_CATEGORIES
Format
An object of class numeric
of length 1.
Levels to remove from factors
Description
Override with options(vlkr.na.levels=c("Not answered"))
.
Usage
VLKR_NA_LEVELS
Format
An object of class character
of length 4.
Numbers to remove from vectors
Description
Override with options(vlkr.na.numbers=c(-2,-9))
.
Usage
VLKR_NA_NUMBERS
Format
An object of class numeric
of length 3.
Output thresholds
Description
Output thresholds
Usage
VLKR_NORMAL_DIGITS
Format
An object of class numeric
of length 1.
Wrapping threshold
Description
Override with options(vlkr.wrap.labels=20)
.
Override with options(vlkr.wrap.legend=10)
.
Override with options(vlkr.wrap.scale=10)
.
Override with options(vlkr.angle.value=30)
.
Override with options(vlkr.angle.threshold=10)
.
Usage
VLKR_PLOT_LABELWRAP
Format
An object of class numeric
of length 1.
Alpha values
Description
Alpha values
Usage
VLKR_POINT_ALPHA
Format
An object of class numeric
of length 1.
Shapes
Description
Shapes
Usage
VLKR_POINT_MEAN_SHAPE
Format
An object of class numeric
of length 1.
Sizes
Description
Sizes
Usage
VLKR_POINT_SIZE
Format
An object of class numeric
of length 1.
Word wrap separators
Description
Word wrap separators
Usage
VLKR_WRAP_SEPARATOR
Format
An object of class character
of length 1.
Add cluster number to a data frame
Description
Clustering is performed using stats::kmeans
.
Usage
add_clusters(data, cols, newcol = NULL, k = 2, method = "kmeans", clean = TRUE)
Arguments
data |
A dataframe. |
cols |
A tidy selection of item columns. |
newcol |
Name of the new cluster column as a character vector. Set to NULL (default) to automatically build a name from the common column prefix, prefixed with "cls_". |
k |
Number of clusters to calculate.
Set to NULL to output a scree plot for up to 10 clusters
and automatically choose the number of clusters based on the elbow criterion.
The within-sums of squares for the scree plot are calculated by
|
method |
The method as character value. Currently, only kmeans is supported.
All items are scaled before performing the cluster analysis using
|
clean |
Prepare data by data_clean. |
Value
The input tibble with additional column containing cluster values as a factor. The new column is prefixed with "cls_". The new column contains the fit result in the attribute stats.kmeans.fit. The names of the items used for clustering are stored in the attribute stats.kmeans.items. The clustering diagnostics (Within-Cluster and Between-Cluster Sum of Squares) are stored in the attribute stats.kmeans.wss.
Examples
library(volker)
ds <- volker::chatgpt
volker::add_clusters(ds, starts_with("cg_adoption"), k = 3)
Add PCA columns along with summary statistics (KMO and Bartlett test) to a data frame
Description
PCA is performed using psych::pca
usind varimax rotation.
Bartlett's test for sphericity is calculated with psych::cortest.bartlett
.
The Kaiser-Meyer-Olkin (KMO) measure is computed using psych::KMO
.
Usage
add_factors(data, cols, newcols = NULL, k = 2, method = "pca", clean = TRUE)
Arguments
data |
A dataframe. |
cols |
A tidy selection of item columns. |
newcols |
Names of the factor columns as a character vector. Must be the same length as k or NULL. Set to NULL (default) to automatically build a name from the common column prefix, prefixed with "fct_", postfixed with the factor number. |
k |
Number of factors to calculate.
Set to NULL to calculate eigenvalues for all components up to the number of items
and automatically choose k. Eigenvalues and the decision on k are calculated by
|
method |
The method as character value. Currently, only pca is supported. |
clean |
Prepare data by data_clean. |
Value
The input tibble with additional columns containing factor values. The new columns are prefixed with "fct_". The first new column contains the fit result in the attribute psych.pca.fit. The names of the items used for factor analysis are stored in the attribute psych.pca.items. The summary diagnostics (Bartlett test and KMO) are stored in the attribute psych.kmo.bartlett.
Examples
library(volker)
ds <- volker::chatgpt
volker::add_factors(ds, starts_with("cg_adoption"))
Calculate the mean value of multiple items
Description
Usage
add_index(data, cols, newcol = NULL, cols.reverse, clean = TRUE)
Arguments
data |
A dataframe. |
cols |
A tidy selection of item columns. |
newcol |
Name of the index as a character value. Set to NULL (default) to automatically build a name from the common column prefix, prefixed with "idx_". |
cols.reverse |
A tidy selection of columns with reversed codings. |
clean |
Prepare data by data_clean. |
Value
The input tibble with an additional column that contains the index values. The column contains the result of the alpha calculation in the attribute named "psych.alpha".
Examples
ds <- volker::chatgpt
volker::add_index(ds, starts_with("cg_adoption"))
Get configured na numbers
Description
Retrieves values either from the option or from the constant.
Usage
cfg_get_na_numbers(default = VLKR_NA_NUMBERS)
Arguments
default |
The default na numbers, if not explicitly provided by na.numbers or the options. return A vector with numbers that should be treated as NAs |
ChatGPT Adoption Dataset CG-GE-APR23
Description
A small random subset of data from a survey about ChatGPT adoption. The survey was conducted in April 2023 within the population of German Internet users.
Usage
chatgpt
Format
chatgpt
A data frame with 101 rows and 19 columns:
- case
A running case number
- adopter
Adoption groups inspired by Roger's innovator typology.
- use_
Columns starting with use contain data about ChatGPT usage in different contexts.
- cg_activities
Text answers to the question, what the respondents do with ChatGPT.
- cg_adoption_
A scale consisting of items about advantages, fears, and social aspects. The scales match theoretical constructs inspired by Roger's diffusion model and Davis' Technology Acceptance Model
- sd_
Columns starting with sd contain sociodemographics of the respondents.
Details
Call codebook(volker::chatgpt) to see the items and answer options.
Source
Communication Department of the University of Münster (gehrau@uni-muenster.de).
Check whether a column exist and stop if not
Description
Check whether a column exist and stop if not
Usage
check_has_column(data, cols, msg = NULL)
Arguments
data |
A data frame. |
cols |
A tidyselection of columns. |
msg |
A custom error message if the check fails. |
Value
boolean Whether the column exists.
Check whether a column selection is categorical
Description
Check whether a column selection is categorical
Usage
check_is_categorical(data, cols, msg = NULL)
Arguments
data |
A data frame. |
cols |
A tidyselection of columns. |
msg |
A custom error message if the check fails. |
Value
boolean Whether the columns are categorical
Check whether the object is a dataframe
Description
Check whether the object is a dataframe
Usage
check_is_dataframe(obj, msg = NULL, stopit = TRUE)
Arguments
obj |
The object to test. |
msg |
Optional, a custom error message. |
stopit |
Whether to stop execution with an error message. |
Value
boolean Whether the object is a data.frame object.
Check whether a column selection is numeric
Description
Check whether a column selection is numeric
Usage
check_is_numeric(data, cols, msg = NULL)
Arguments
data |
A data frame. |
cols |
A tidyselection of columns. |
msg |
A custom error message if the check fails. |
Value
boolean Whether the columns are numeric.
Check whether a parameter value is from a valid set
Description
Check whether a parameter value is from a valid set
Usage
check_is_param(
value,
allowed,
allownull = FALSE,
allowmultiple = FALSE,
stopit = TRUE,
msg = NULL
)
Arguments
value |
A character value. |
allowed |
Allowed values. |
allownull |
Whether to allow NULL values. |
allowmultiple |
Whether to allow multiple values. |
stopit |
Whether to stop execution if the value is invalid. |
msg |
A custom error message if the check fails. |
Value
logical whether method is valid.
Get plot for clustering result
Description
Kmeans clustering is performed using add_clusters.
Usage
cluster_plot(
data,
cols,
newcol = NULL,
k = NULL,
method = NULL,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
cols |
A tidy selection of item columns or a single column with cluster values as a factor. If the column already contains a cluster result from add_clusters, it is used, and other parameters are ignored. If no cluster result exists, it is calculated with add_clusters. |
newcol |
Name of the new cluster column as a character vector. Set to NULL (default) to automatically build a name from the common column prefix, prefixed with "cls_". |
k |
Number of clusters to calculate.
Set to NULL to output a scree plot for up to 10 clusters
and automatically choose the number of clusters based on the elbow criterion.
The within-sums of squares for the scree plot are calculated by
|
method |
The method as character value. Currently, only kmeans is supported.
All items are scaled before performing the cluster analysis using
|
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_metrics. |
Value
A ggplot object.
Examples
library(volker)
data <- volker::chatgpt
cluster_plot(data, starts_with("cg_adoption"), k = 2)
Get tables for clustering result
Description
Kmeans clustering is performed using add_clusters.
Usage
cluster_tab(
data,
cols,
newcol = NULL,
k = NULL,
method = "kmeans",
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
cols |
A tidy selection of item columns or a single column with cluster values as a factor. If the column already contains a cluster result from add_clusters, it is used, and other parameters are ignored. If no cluster result exists, it is calculated with add_clusters. |
newcol |
Name of the new cluster column as a character vector. Set to NULL (default) to automatically build a name from the common column prefix, prefixed with "cls_". |
k |
Number of clusters to calculate.
Set to NULL to output a scree plot for up to 10 clusters
and automatically choose the number of clusters based on the elbow criterion.
The within-sums of squares for the scree plot are calculated by
|
method |
The method as character value. Currently, only kmeans is supported.
All items are scaled before performing the cluster analysis using
|
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from tab_metrics. |
Value
A volker list with with three volker tabs: cluster centers, cluster counts, and clustering diagnostics.
Examples
library(volker)
data <- volker::chatgpt
cluster_tab(data, starts_with("cg_adoption"), k = 2)
Get variable and value labels from a data set
Description
Variable labels are extracted from their comment or label attribute. Variable values are extracted from factor levels, the labels attribute, numeric or boolean attributes.
Usage
codebook(data, cols, values = TRUE)
Arguments
data |
A tibble. |
cols |
A tidy variable selections to filter specific columns. |
values |
Whether to output values (TRUE) or only items (FALSE) |
Details
Value
A tibble with the columns:
item_name: The column name.
item_group: First part of the column name, up to an underscore.
item_class: The last class value of an item (e.g. numeric, factor).
item_label: The comment attribute of the column.
value_name: In case a column has numeric attributes, the attribute names.
value_label: In case a column has numeric attributes or T/F-attributes, the attribute values. In case a column has a levels attribute, the levels.
Examples
volker::codebook(volker::chatgpt)
Convert numeric values to string
Description
Convert numeric values to string
Usage
data_cat(data, cols)
Arguments
data |
A data frame containing the items to be converted. |
cols |
A tidy selection of columns to convert. |
Value
A data frame with the converted values
Prepare dataframe for the analysis
Description
Depending on the selected cleaning plan, for example, recodes residual values to NA.
Usage
data_clean(data, plan = "default", ...)
Arguments
data |
Data frame. |
plan |
The cleaning plan. By now, only "default" is supported. See data_clean_default. |
... |
Other parameters passed to the appropriate cleaning function. |
Details
The tibble remembers whether it was already cleaned and the cleaning plan is only applyed once in the first call.
Value
Cleaned data frame with vlkr_df class.
Examples
ds <- volker::chatgpt
ds <- data_clean(ds)
Prepare data originating from SoSci Survey or SPSS
Description
Preparation steps:
Remove the avector class from all columns (comes from SoSci and prevents combining vectors)
Recode residual factor values to NA (e.g. "NA nicht beantwortet")
Recode residual numeric values to NA (e.g. -9)
Usage
data_clean_default(data, remove.na.levels = TRUE, remove.na.numbers = TRUE)
Arguments
data |
Data frame |
remove.na.levels |
Remove residual values from factor columns.
Either a character vector with residual values or TRUE to use defaults in VLKR_NA_LEVELS.
You can also define or disable residual levels by setting the global option vlkr.na.levels
(e.g. |
remove.na.numbers |
Remove residual values from numeric columns.
Either a numeric vector with residual values or TRUE to use defaults in VLKR_NA_NUMBERS.
You can also define or disable residual values by setting the global option vlkr.na.numbers
(e.g. |
Details
The tibble remembers whether it was already prepared and the operations are only performed once in the first call.
Value
Data frame with vlkr_df class (the class is used to prevent double preparation).
Examples
ds <- volker::chatgpt
ds <- data_clean_default(ds)
Convert values to numeric values
Description
Convert values to numeric values
Usage
data_num(data, cols)
Arguments
data |
A data frame containing the items to be converted. |
cols |
A tidy selection of columns to convert. |
Value
A data frame with the converted values
Prepare data for calculation
Description
Clean data, check column selection, remove cases with missing values
Usage
data_prepare(
data,
cols,
cross,
cols.categorical,
cols.numeric,
cols.reverse,
clean = TRUE
)
Arguments
data |
Data frame to be prepared. |
cols |
The first column selection. |
cross |
The second column selection. |
cols.categorical |
A tidy selection of columns to be checked for categorical values. |
cols.numeric |
A tidy selection of columns to be converted to numeric values. |
cols.reverse |
A tidy selection of columns with reversed codings. |
clean |
Whether to clean data using data_clean. |
Value
Prepared data frame.
Examples
data <- volker::chatgpt
data_prepare(data, sd_age, sd_gender)
Reverse item values
Description
Reverse item values
Usage
data_rev(data, cols)
Arguments
data |
A data frame containing the items to be reversed. |
cols |
A tidy selection of columns to reverse. For example, if you want to calculate an index of the two items "I feel bad about this" and "I like it", both coded with 1=not at all to 5=fully agree, you need to reverse one of them to make the codings compatible. |
Value
A data frame with the specified items reversed.
Remove missings and output a message
Description
Remove missings and output a message
Usage
data_rm_missings(data, cols)
Arguments
data |
Data frame. |
cols |
A tidy column selection. |
Value
Data frame.
Remove NA levels
Description
Remove NA levels
Usage
data_rm_na_levels(data, na.levels = TRUE, default = VLKR_NA_LEVELS)
Arguments
data |
Data frame |
na.levels |
Residual values to remove from factor columns.
Either a character vector with residual values or TRUE to use defaults in VLKR_NA_LEVELS.
You can define default residual levels by setting the global option vlkr.na.levels
(e.g. |
default |
The default na levels, if not explicitly provided by na.levels or the options. |
Value
Data frame
Remove NA numbers
Description
Remove NA numbers
Usage
data_rm_na_numbers(
data,
na.numbers = TRUE,
check.labels = TRUE,
default = VLKR_NA_NUMBERS
)
Arguments
data |
Data frame |
na.numbers |
Either a numeric vector with residual values or TRUE to use defaults in VLKR_NA_NUMBERS.
You can also define residual values by setting the global option vlkr.na.numbers
(e.g. |
check.labels |
Whether to only remove NA numbers that are listed in the attributes of a column. |
default |
The default na numbers, if not explicitly provided by na.numbers or the options. |
Value
Data frame
Remove negatives and output a warning
Description
Remove negatives and output a warning
Usage
data_rm_negatives(data, cols)
Arguments
data |
Data frame |
cols |
A tidy column selection |
Value
Data frame
Remove zero values, drop missings and output a message
Description
Remove zero values, drop missings and output a message
Usage
data_rm_zeros(data, cols)
Arguments
data |
Data frame. |
cols |
A tidy column selection. |
Value
Data frame.
Output effect sizes and test statistics for count data
Description
The type of effect size depends on the number of selected columns:
One categorical column: see effect_counts_one
Multiple categorical columns: see effect_counts_items
Cross tabulations:
One categorical column and one grouping column: see effect_counts_one_grouped
Multiple categorical columns and one grouping column: see effect_counts_items_grouped (not yet implemented)
Multiple categorical columns and multiple grouping columns: effect_counts_items_grouped_items (not yet implemented)
By default, if you provide two column selections, the second column is treated as categorical. Setting the metric-parameter to TRUE will call the appropriate functions for correlation analysis:
One categorical column and one metric column: see effect_counts_one_cor (not yet implemented)
Multiple categorical columns and one metric column: see effect_counts_items_cor (not yet implemented)
Multiple categorical columns and multiple metric columns:effect_counts_items_cor_items (not yet implemented)
Usage
effect_counts(data, cols, cross = NULL, metric = FALSE, clean = TRUE, ...)
Arguments
data |
A data frame. |
cols |
A tidy column selection, e.g. a single column (without quotes) or multiple columns selected by methods such as starts_with(). |
cross |
Optional, a grouping column. The column name without quotes. |
metric |
When crossing variables, the cross column parameter can contain categorical or metric values. By default, the cross column selection is treated as categorical data. Set metric to TRUE, to treat it as metric and calculate correlations. |
clean |
Prepare data by data_clean. |
... |
Other parameters passed to the appropriate effect function. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
effect_counts(data, sd_gender, adopter)
Test homogeneity of category shares for multiple items
Description
Performs a goodness-of-fit test and calculates the Gini coefficient for each item.
The goodness-of-fit-test is calculated using stats::chisq.test
.
Usage
effect_counts_items(data, cols, labels = TRUE, clean = TRUE, ...)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_counts. |
Value
A volker tibble with the following statistical measures:
-
Gini coefficient: Gini coefficient, measuring inequality.
-
n: Number of cases the calculation is based on.
-
Chi-squared: Chi-Squared test statistic.
-
p: p-value for the statistical test.
-
stars: Significance stars based on p-value (*, **, ***).
Examples
library(volker)
data <- volker::chatgpt
effect_counts_items(data, starts_with("cg_adoption_adv"))
Correlate the values in multiple items with one metric column and output effect sizes and tests
Description
Not yet implemented. The future will come.
Usage
effect_counts_items_cor(data, cols, cross, clean = TRUE, ...)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
The metric column. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_counts. |
Value
A volker tibble.
Correlate the values in multiple items with multiple metric columns and output effect sizes and tests
Description
Not yet implemented. The future will come.
Usage
effect_counts_items_cor_items(data, cols, cross, clean = TRUE, ...)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
The metric target columns. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_counts. |
Value
A volker tibble.
Effect size and test for comparing multiple variables by a grouping variable
Description
Not yet implemented. The future will come.
Usage
effect_counts_items_grouped(data, cols, cross, clean = TRUE, ...)
Arguments
data |
A tibble containing item measures and grouping variable. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
The column holding groups to compare. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_counts. |
Value
A volker tibble.
Effect size and test for comparing multiple variables by multiple grouping variables
Description
Not yet implemented. The future will come.
Usage
effect_counts_items_grouped_items(data, cols, cross, clean = TRUE, ...)
Arguments
data |
A tibble containing item measures and grouping variable. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
The columns holding groups to compare. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_counts. |
Value
A volker tibble.
Test homogeneity of category shares
Description
Performs a goodness-of-fit test and calculates the Gini coefficient.
The goodness-of-fit-test is calculated using stats::chisq.test
.
Usage
effect_counts_one(data, col, clean = TRUE, ...)
Arguments
data |
A tibble. |
col |
The column holding factor values. |
clean |
Prepare data by data_clean |
... |
Placeholder to allow calling the method with unused parameters from effect_counts. |
Value
A volker tibble with the following statistical measures:
-
Gini coefficient: Gini coefficient, measuring inequality.
-
n: Number of cases the calculation is based on.
-
Chi-squared: Chi-Squared test statistic.
-
p: p-value for the statistical test.
-
stars: Significance stars based on p-value (*, **, ***).
Examples
library(volker)
data <- volker::chatgpt
data |>
filter(sd_gender != "diverse") |>
effect_counts_one(sd_gender)
Output test statistics and effect size from a logistic regression of one metric predictor
Description
Not yet implemented. The future will come.
Usage
effect_counts_one_cor(data, col, cross, clean = TRUE, labels = TRUE, ...)
Arguments
data |
A tibble. |
col |
The column holding factor values. |
cross |
The column holding metric values. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_counts. |
Value
A volker tibble.
Output test statistics and effect size for contingency tables
Description
Chi squared is calculated using stats::chisq.test
.
If any cell contains less than 5 observations, the exact-parameter is set.
Usage
effect_counts_one_grouped(data, col, cross, clean = TRUE, ...)
Arguments
data |
A tibble. |
col |
The column holding factor values. |
cross |
The column holding groups to compare. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_counts. |
Details
Phi is derived from the Chi squared value by sqrt(fit$statistic / n)
.
Cramer's V is derived by sqrt(phi / (min(dim(contingency)[1], dim(contingency)[2]) - 1))
.
Value
A volker tibble with the following statistical measures:
-
Cramer's V: Effect size measuring the association between two variables.
-
n: Number of cases the calculation is based on.
-
Chi-squared: Chi-Squared test statistic.
-
df: Degrees of freedom.
-
p: p-value for the statistical test.
-
stars: Significance stars based on p-value (*, **, ***).
Examples
library(volker)
data <- volker::chatgpt
effect_counts_one_grouped(data, adopter, sd_gender)
Output effect sizes and test statistics for metric data
Description
The calculations depend on the number of selected columns:
One metric column: see effect_metrics_one
Multiple metric columns: see effect_metrics_items
Group comparisons:
One metric column and one grouping column: see effect_metrics_one_grouped
Multiple metric columns and one grouping column: see effect_metrics_items_grouped
Multiple metric columns and multiple grouping columns: not yet implemented
By default, if you provide two column selections, the second column is treated as categorical. Setting the metric-parameter to TRUE will call the appropriate functions for correlation analysis:
Two metric columns: see effect_metrics_one_cor
Multiple metric columns and one metric column: see effect_metrics_items_cor
Two metric column selections: see effect_metrics_items_cor_items
Usage
effect_metrics(data, cols, cross = NULL, metric = FALSE, clean = TRUE, ...)
Arguments
data |
A data frame. |
cols |
A tidy column selection, e.g. a single column (without quotes) or multiple columns selected by methods such as starts_with(). |
cross |
Optional, a grouping column (without quotes). |
metric |
When crossing variables, the cross column parameter can contain categorical or metric values. By default, the cross column selection is treated as categorical data. Set metric to TRUE, to treat it as metric and calculate correlations. |
clean |
Prepare data by data_clean. |
... |
Other parameters passed to the appropriate effect function. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
effect_metrics(data, sd_age, sd_gender)
Test whether a distribution is normal for each item
Description
The test is calculated using stats::shapiro.test
.
Usage
effect_metrics_items(data, cols, labels = TRUE, clean = TRUE, ...)
Arguments
data |
A tibble containing item measures. |
cols |
The column holding metric values. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_metrics. |
Value
A volker table containing itemwise statistics:
-
skewness: Measure of asymmetry in the distribution. A value of 0 indicates perfect symmetry.
-
kurtosis: Measure of the "tailedness" of the distribution.
-
W: W-statistic from the Shapiro-Wilk normality test.
-
p: p-value for the statistical test.
-
stars: Significance stars based on p-value (*, **, ***).
-
normality: Interpretation of normality based on Shapiro-Wilk test.
Examples
library(volker)
data <- volker::chatgpt
effect_metrics_items(data, starts_with("cg_adoption"))
Output correlation coefficients for items and one metric variable
Description
The correlation is calculated using stats::cor.test
.
Usage
effect_metrics_items_cor(
data,
cols,
cross,
method = "pearson",
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
The column holding metric values to correlate. |
method |
The output metrics, pearson = Pearson's R, spearman = Spearman's rho. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_metrics. |
Value
A volker table containing itemwise correlations:
If method = "pearson"
:
-
R-squared: Coefficient of determination.
-
n: Number of cases the calculation is based on.
-
Pearson's r: Correlation coefficient.
-
ci low / ci high: Lower and upper bounds of the 95% confidence interval.
-
df: Degrees of freedom.
-
t: t-statistic.
-
p: p-value for the statistical test, indicating whether the correlation differs from zero.
-
stars: Significance stars based on the p-value (*, **, ***).
If method = "spearman"
:
-
Spearman's rho is displayed instead of Pearson's r.
-
S-statistic is used instead of the t-statistic.
Examples
library(volker)
data <- volker::chatgpt
effect_metrics_items_cor(
data, starts_with("cg_adoption_adv"), sd_age
)
Output correlation coefficients for multiple items
Description
The correlation is calculated using stats::cor.test
.
Usage
effect_metrics_items_cor_items(
data,
cols,
cross,
method = "pearson",
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
Tidyselect item variables (e.g. starts_with...). |
method |
The output metrics, pearson = Pearson's R, spearman = Spearman's rho. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_metrics. |
Value
A volker table containing correlations.
If method = "pearson"
:
-
R-squared: Coefficient of determination.
-
n: Number of cases the calculation is based on.
-
Pearson's r: Correlation coefficient.
-
ci low / ci high: Lower and upper bounds of the 95% confidence interval.
-
df: Degrees of freedom.
-
t: t-statistic.
-
p: p-value for the statistical test, indicating whether the correlation differs from zero.
-
stars: Significance stars based on the p-value (*, **, ***).
If method = "spearman"
:
-
Spearman's rho is displayed instead of Pearson's r.
-
S-statistic is used instead of the t-statistic.
Examples
library(volker)
data <- volker::chatgpt
effect_metrics_items_cor_items(
data,
starts_with("cg_adoption_adv"),
starts_with("use"),
metric = TRUE
)
Compare groups for each item by calculating F-statistics and effect sizes
Description
The models are fitted using stats::lm
.
ANOVA of type II is computed for each fitted model using car::Anova
.
Eta Squared is calculated for each ANOVA result
using effectsize::eta_squared
.
Usage
effect_metrics_items_grouped(
data,
cols,
cross,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
The column holding groups to compare. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_metrics. |
Value
A volker tibble with the following statistical measures:
-
Eta-squared: Effect size indicating the proportion of variance in the dependent variable explained by the predictor.
-
Eta: Root of Eta-squared, a standardized effect size.
-
n: Number of cases the calculation is based on.
-
F: F-statistic from the linear model.
-
p: p-value for the statistical test.
-
stars: Significance stars based on p-value (*, **, ***).
Examples
library(volker)
data <- volker::chatgpt
effect_metrics(data, starts_with("cg_adoption_"), adopter)
Compare groups for each item with multiple target items by calculating F-statistics and effect sizes
Description
Not yet implemented. The future will come.
Usage
effect_metrics_items_grouped_items(data, cols, cross, clean = TRUE, ...)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
The grouping items. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_counts. |
Value
A volker tibble.
Test whether a distribution is normal
Description
The test is calculated using stats::shapiro.test
.
Usage
effect_metrics_one(data, col, labels = TRUE, clean = TRUE, ...)
Arguments
data |
A tibble. |
col |
The column holding metric values. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_metrics. |
Value
A volker list object with the following statistical measures:
-
skewness: Measure of asymmetry in the distribution. A value of 0 indicates perfect symmetry.
-
kurtosis: Measure of the "tailedness" of the distribution.
-
W: W-statistic from the Shapiro-Wilk normality test.
-
p: p-value for the statistical test.
-
stars: Significance stars based on p-value (*, **, ***).
-
normality: Interpretation of normality based on Shapiro-Wilk test.
Examples
library(volker)
data <- volker::chatgpt
effect_metrics_one(data, sd_age)
Test whether the correlation is different from zero
Description
The correlation is calculated using stats::cor.test
.
Usage
effect_metrics_one_cor(
data,
col,
cross,
method = "pearson",
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
col |
The column holding metric values. |
cross |
The column holding metric values to correlate. |
method |
The output metrics, TRUE or pearson = Pearson's R, spearman = Spearman's rho. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_metrics. |
Value
A volker table containing the requested statistics.
If method = "pearson"
:
-
R-squared: Coefficient of determination.
-
n: Number of cases the calculation is based on.
-
Pearson's r: Correlation coefficient.
-
ci low / ci high: Lower and upper bounds of the 95% confidence interval.
-
df: Degrees of freedom.
-
t: t-statistic.
-
p: p-value for the statistical test, indicating whether the correlation differs from zero.
-
stars: Significance stars based on the p-value (*, **, ***).
If method = "spearman"
:
-
Spearman's rho is displayed instead of Pearson's r.
-
S-statistic is used instead of the t-statistic.
Examples
library(volker)
data <- volker::chatgpt
effect_metrics_one_cor(data, sd_age, use_private, metric = TRUE)
Output a regression table with estimates and macro statistics
Description
The regression output comes from stats::lm
.
T-test is performed using stats::t.test
.
Normality check is performed using
stats::shapiro.test
.
Equality of variances across groups is assessed using car::leveneTest
.
Cohen's d is calculated using effectsize::cohens_d
.
Usage
effect_metrics_one_grouped(
data,
col,
cross,
method = "lm",
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
col |
The column holding metric values. |
cross |
The column holding groups to compare. |
method |
A character vector of methods, e.g. c("t.test","lm"). Supported methods are t.test (only valid if the cross column contains two levels) and lm (regression results). |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_metrics. |
Value
A volker list object containing volker tables with the requested statistics.
Regression table:
-
estimate: Regression coefficient (unstandardized).
-
ci low / ci high: lower and upper bound of the 95% confidence interval.
-
se: Standard error of the estimate.
-
t: t-statistic.
-
p: p-value for the statistical test.
-
stars: Significance stars based on p-value (*, **, ***).
Macro statistics:
-
Adjusted R-squared: Adjusted coefficient of determination.
-
F: F-statistic for the overall significance of the model.
-
df: Degrees of freedom for the model.
-
residual df: Residual degrees of freedom.
-
p: p-value for the statistical test.
-
stars: Significance stars based on p-value (*, **, ***).
If method = t.test
:
Shapiro-Wilk test (normality check):
-
W: W-statistic from the Shapiro-Wilk normality test.
-
p: p-value for the test.
-
normality: Interpretation of the Shapiro-Wilk test.
Levene test (equality of variances):
-
F: F-statistic from the Levene test for equality of variances between groups.
-
p: p-value for Levene's test.
-
variances: Interpretation of the Levene test.
Cohen's d (effect size):
-
d: Standardized mean difference between the two groups.
-
ci low / ci high: Lower and upper bounds of the 95% confidence interval.
t-test
-
method: Type of t-test performed (e.g., "Two Sample t-test").
-
difference: Observed difference between group means.
-
ci low / ci high: Lower and upper bounds of the 95% confidence interval.
-
se: Estimated standard error of the difference.
-
df: Degrees of freedom used in the t-test.
-
t: t-statistic.
-
p: p-value for the t-test.
-
stars: Significance stars based on p-value (
*
,**
,***
).
Examples
library(volker)
data <- volker::chatgpt
effect_metrics_one_grouped(data, sd_age, sd_gender)
Select variables by their postfix
Description
See tidyselect::ends_with
for details.
Get plot with factor analysis result
Description
PCA is performed using add_factors.
Usage
factor_plot(
data,
cols,
newcols = NULL,
k = 2,
method = "pca",
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A dataframe. |
cols |
A tidy selection of item columns. If the first column already contains a pca from add_factors, the result is used. Other parameters are ignored. If there is no pca result yet, it is calculated by add_factors first. |
newcols |
Names of the factor columns as a character vector. Must be the same length as k or NULL. Set to NULL (default) to automatically build a name from the common column prefix, prefixed with "fct_", postfixed with the factor number. |
k |
Number of factors to calculate.
Set to NULL to generate a scree plot with eigenvalues for all components up to the number of items
and automatically choose k. Eigenvalues and the decision on k are calculated by
|
method |
The method as character value. Currently, only pca is supported. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_metrics. |
Value
A ggplot object.
Examples
library(volker)
ds <- volker::chatgpt
volker::factor_plot(ds, starts_with("cg_adoption"), k = 3)
Get tables with factor analysis results
Description
PCA is performed using add_factors.
Usage
factor_tab(
data,
cols,
newcols = NULL,
k = 2,
method = "pca",
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A dataframe. |
cols |
A tidy selection of item columns. If the first column already contains a pca result from \link{add_factors}, the result is used. Other parameters are ignored. If there is no pca result yet, it is calculated by \link{add_factors} first. |
newcols |
Names of the new factor columns as a character vector. Must be the same length as k or NULL. Set to NULL (default) to automatically build a name from the common column prefix, prefixed with "fct_", postfixed with the factor number. |
k |
Number of factors to calculate.
Set to NULL to report eigenvalues for all components up to the number of items
and automatically choose k. Eigenvalues and the decision on k are calculated by
|
method |
The method as character value. Currently, only pca is supported. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from tab_metrics. |
Value
A volker list with with three volker tabs: loadings, variances and diagnostics.
Examples
library(volker)
ds <- volker::chatgpt
volker::factor_tab(ds, starts_with("cg_adoption"), k = 3)
Filter function
Description
See dplyr::filter
for details.
Get number of items and Cronbach's alpha of a scale added by add_index()
Description
TODO: Rename to index_tab, return volker list as in factor_tab()
Usage
get_alpha(data)
Arguments
data |
A data frame column. |
Value
A named list with with the keys "items" and "alpha".
Angle labels
Description
Calculate angle for label adjustment based on character length.
Usage
get_angle(
labels,
threshold = VLKR_PLOT_ANGLE_THRESHOLD,
angle = VLKR_PLOT_ANGLE_VALUE
)
Arguments
labels |
Vector of labels to check. The values are converted to characters. |
threshold |
Length threshold beyond which the angle is applied.
Default is 20. Override with |
angle |
The angle to apply if any label exceeds the threshold.
Default is 45. Override with |
Value
A single angle value.
Get a formatted baseline for removed zero, negative, and missing cases and include focus category information if present
Description
Get a formatted baseline for removed zero, negative, and missing cases and include focus category information if present
Usage
get_baseline(obj)
Arguments
obj |
An object with the missings and focus attributes. |
Value
A formatted message or NULL if missings and focus attributes are not present.
Calculate ci values to be used for error bars on a plot
Description
Calculate ci values to be used for error bars on a plot
Usage
get_ci(x, conf = 0.95)
Arguments
x |
A numeric vector. |
conf |
The confidence level. |
Value
A named list with values for y, ymin, and ymax.
Detect whether a scale is a numeric sequence
Description
From all values in the selected columns, the numbers are extracted. If no numeric values can be found, returns 0. Otherwise, if any positive values form an ascending sequence, returns -1. In all other cases, returns 1.
Usage
get_direction(data, cols, extract = TRUE)
Arguments
data |
The dataframe. |
cols |
The tidy selection. |
extract |
Whether to extract numeric values from characters. |
Value
0 = an undirected scale, -1 = descending values, 1 = ascending values.
Calculate Eta squared
Description
Calculate Eta squared
Usage
get_etasq(fit)
Arguments
fit |
A model |
Value
A data frame with at least the column Eta2
Calculate the Gini coefficient
Description
Calculate the Gini coefficient
Usage
get_gini(x)
Arguments
x |
A vector of counts or other values |
Value
The gini coefficient
Get the labels of values from a codebook
Description
Get the labels of values from a codebook
Usage
get_labels(codes, values)
Arguments
codes |
The codebook as it results from the codebook() function |
values |
A vector of labels |
Value
The labels. If the values are not present in the codebook, returns the values.
Get the numeric range from the labels
Description
Gets the range of all values in the selected columns by the first successful of the following methods:
Usage
get_limits(data, cols, negative = TRUE)
Arguments
data |
The labeled data frame. |
cols |
A tidy variable selection. |
negative |
Whether to include negative values. |
Details
Inspect the limits column attribute.
Lookup the value names in the codebook.
Calculate the range from all values in the columns.
Value
A list or NULL.
Get the common prefix of character values
Description
Helper function taken from the biobase package. Duplicated here instead of loading the package to avoid overhead. See https://github.com/Bioconductor/Biobase
Usage
get_prefix(x, ignore.case = FALSE, trim = FALSE, delimiters = c(":", "\n"))
Arguments
x |
Character vector. |
ignore.case |
Whether case matters (default). |
trim |
Whether non alphabetic characters should be trimmed. |
delimiters |
A list of prefix delimiters.
If any of the delimiters is present in the extracted prefix,
the part after is removed from the prefix.
Consider the following two items as an example:
"Usage: in " , but it makes more sense to break it after the colon. |
Value
The longest common prefix of the strings.
Get significance stars from p values
Description
Get significance stars from p values
Usage
get_stars(x)
Arguments
x |
A vector of p values. |
Value
A character vector with significance stars.
Get a common title for a column selection
Description
Get a common title for a column selection
Usage
get_title(data, cols, default = NULL)
Arguments
data |
A tibble. |
cols |
A tidy column selection. |
default |
A character string used in case not prefix is found |
Value
A character string.
Volker style HTML document format
Description
Based on the standard theme, tweaks the pill navigation
to switch between tables and plots.
To use the format, in the header of your Markdown document,
set output: volker::html_report
.
Usage
html_report(...)
Arguments
... |
Additional arguments passed to html_document. |
Value
R Markdown output format.
Examples
## Not run:
# Add `volker::html_report` to the output options of your Markdown document:
#
# ```
# ---
# title: "How to create reports?"
# output: volker::html_report
# ---
# ```
## End(Not run)
Deprecated Alias for add_index
Description
idx_add()
was renamed to add_index()
.
Usage
idx_add(data, cols, newcol = NULL, reverse = NULL, clean = TRUE)
Details
This function is a deprecated alias for add_index
.
Printing method for volker plots when knitting
Description
Printing method for volker plots when knitting
Usage
## S3 method for class 'vlkr_plt'
knit_print(x, ...)
Arguments
x |
The volker plot. |
... |
Further parameters passed to print(). |
Value
Knitr asis output
Examples
library(volker)
data <- volker::chatgpt
pl <- plot_metrics(data, sd_age)
print(pl)
Wrap labels in plot scales
Description
Wrap labels in plot scales
Usage
label_scale(x, scale)
Arguments
x |
The label vector. |
scale |
A named label vector to select elements that should be wrapped. Prevents numbers from being wrapped. |
Value
A vevtor of wrapped labels.
Set column and value labels
Description
Usage
labs_apply(data, codes = NULL, cols = NULL, items = TRUE, values = TRUE)
Arguments
data |
A tibble containing the dataset. |
codes |
A tibble in codebook format. |
cols |
A tidy column selection. Set to NULL (default) to apply to all columns found in the codebook. Restricting the columns is helpful when you want to set value labels. In this case, provide a tibble with value_name and value_label columns and specify the columns that should be modified. |
items |
If TRUE, column labels will be retrieved from the codes (the default). If FALSE, no column labels will be changed. Alternatively, a named list of column names with their labels. |
values |
If TRUE, value labels will be retrieved from the codes (default). If FALSE, no value labels will be changed. Alternatively, a named list of value names with their labels. In this case, use the cols-Parameter to define which columns should be changed. |
Details
You can either provide a data frame in codebook format to the codes-parameter or provide named lists to the items- or values-parameter.
When working with a codebook in the codes-parameter:
Change column labels by providing the columns item_name and item_label in the codebook. Set the items-parameter to TRUE (the default setting).
Change value labels by providing the columns value_name and value_label in the codebook. To tell which columns should be changed, you can either use the item_name column in the codebook or use the cols-parameter. For factor values, the levels and their order are retrieved from the value_label column. For coded values, labels are retrieved from both the columns value_name and value_label.
When working with lists in the items- or values-parameter:
Change column labels by providing a named list to the items-parameter. The list contains labels named by the columns. Set the parameters codes and cols to NULL (their default value).
Change value labels by providing a named list to the values-parameter. The list contains labels named by the values. Provide the column selection in the cols-parameter. Set the codes-parameter to NULL (its default value).
Value
A tibble containing the dataset with new labels.
Examples
library(volker)
# Set column labels using the items-parameter
volker::chatgpt %>%
labs_apply(
items = list(
"cg_adoption_advantage_01" = "Allgemeine Vorteile",
"cg_adoption_advantage_02" = "Finanzielle Vorteile",
"cg_adoption_advantage_03" = "Vorteile bei der Arbeit",
"cg_adoption_advantage_04" = "Macht mehr Spaß"
)
) %>%
tab_metrics(starts_with("cg_adoption_advantage_"))
# Set value labels using the values-parameter
volker::chatgpt %>%
labs_apply(
cols=starts_with("cg_adoption"),
values = list(
"1" = "Stimme überhaupt nicht zu",
"2" = "Stimme nicht zu",
"3" = "Unentschieden",
"4" = "Stimme zu",
"5" = "Stimme voll und ganz zu"
)
) %>%
plot_metrics(starts_with("cg_adoption"))
Remove all comments from the selected columns
Description
Usage
labs_clear(data, cols, labels = NULL)
Arguments
data |
A tibble. |
cols |
Tidyselect columns. |
labels |
The attributes to remove. NULL to remove all attributes except levels and class. |
Value
A tibble with comments removed.
Examples
library(volker)
volker::chatgpt |>
labs_clear()
Add missing residual labels in numeric columns that have at least one labeled value
Description
Add missing residual labels in numeric columns that have at least one labeled value
Usage
labs_impute(data)
Arguments
data |
A tibble |
Value
A tibble with added value labels
Replace item value names in a column by their labels
Description
Replace item value names in a column by their labels
Usage
labs_replace(
data,
col,
codes,
col_from = "value_name",
col_to = "value_label",
na.missing = FALSE
)
Arguments
data |
A tibble. |
col |
The column holding item values. |
codes |
The codebook to use: A tibble with the columns
value_name and value_label.
Can be created by the codebook function, e.g. by calling
|
col_from |
The tidyselect column with source values, defaults to value_name. If the column is not found in the codebook, the first column is used. |
col_to |
The tidyselect column with target values, defaults to value_label. If the column is not found in the codebook, the second column is used |
na.missing |
By default, the column is converted to a factor with levels combined from the codebook and the data. Set na.missing to TRUE to set all levels not found in the codes to NA. |
Value
Tibble with new labels.
Restore labels from the codebook store in the codebook attribute.
Description
Usage
labs_restore(data, cols = NULL)
Arguments
data |
A data frame. |
cols |
A tidyselect column selection. |
Details
You can store labels before mutate operations by calling labs_store.
Value
A data frame.
Examples
library(dplyr)
library(volker)
volker::chatgpt |>
labs_store() |>
mutate(sd_age = 2024 - sd_age) |>
labs_restore() |>
tab_metrics(sd_age)
Get the current codebook and store it in the codebook attribute.
Description
Usage
labs_store(data)
Arguments
data |
A data frame. |
Details
You can restore the labels after mutate operations by calling labs_restore.
Value
A data frame.
Examples
library(dplyr)
library(volker)
volker::chatgpt |>
labs_store() |>
mutate(sd_age = 2024 - sd_age) |>
labs_restore() |>
tab_metrics(sd_age)
Plot regression coefficients
Description
The regression output comes from stats::lm
.
Usage
model_metrics_plot(
data,
col,
categorical,
metric,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
col |
The target column holding metric values. |
categorical |
A tidy column selection holding categorical variables. |
metric |
A tidy column selection holding metric variables. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_metrics. |
Value
A volker list object containing volker plots
Examples
library(volker)
data <- volker::chatgpt
data |>
filter(sd_gender != "diverse") |>
model_metrics_plot(use_work, categorical = c(sd_gender, adopter), metric = sd_age)
Output a regression table with estimates and macro statistics for multiple categorical or metric independent variables
Description
The regression output comes from stats::lm
.
Usage
model_metrics_tab(
data,
col,
categorical,
metric,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
col |
The target column holding metric values. |
categorical |
A tidy column selection holding categorical variables. |
metric |
A tidy column selection holding metric variables. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from effect_metrics. |
Value
A volker list object containing volker tables with the requested statistics.
Examples
library(volker)
data <- volker::chatgpt
data |>
filter(sd_gender != "diverse") |>
model_metrics_tab(use_work, categorical = c(sd_gender, adopter), metric = sd_age)
Mutate function
Description
See dplyr::mutate
for details.
Convert a named vector to a list
Description
Convert a named vector to a list
Usage
named.to.list(x)
Arguments
x |
A named vector or a list |
Value
Lists are returned as is. Vectors are converted to lists with names as list names.
Volker style PDF document format
Description
Based on the standard theme, tweaks tex headers.
To use the format, in the header of your Markdown document,
set output: volker::pdf_report
.
Usage
pdf_report(...)
Arguments
... |
Additional arguments passed to pdf_document. |
Value
R Markdown output format.
Examples
## Not run:
# Add `volker::pdf_report` to the output options of your Markdown document:
#
# ```
# ---
# title: "How to create reports?"
# output: volker::pdf_report
# ---
# ```
## End(Not run)
Output a frequency plot
Description
The type of frequency plot depends on the number of selected columns:
One categorical column: see plot_counts_one
Multiple categorical columns: see plot_counts_items
Cross tabulations:
One categorical column and one grouping column: see plot_counts_one_grouped
Multiple categorical columns and one grouping column: see plot_counts_items_grouped
Two categorical column selections: see plot_counts_items_grouped_items (not yet implemented)
By default, if you provide two column selections, the second selection is treated as categorical. Setting the metric-parameter to TRUE will call the appropriate functions for correlation analysis:
One categorical column and one metric column: see plot_counts_one_cor
Multiple categorical columns and one metric column: see plot_counts_items_cor
Multiple categorical columns and multiple metric columns: see plot_counts_items_cor_items (not yet implemented)
Parameters that may be passed to the count functions (see the respective function help):
-
ci: Add confidence intervals to proportions.
-
ordered: The values of the cross column can be nominal (0), ordered ascending (1), or ordered descending (-1). The colors are adjusted accordingly.
-
category: When you have multiple categories in a column, you can focus one of the categories to simplify the plots. By default, if a column has only TRUE and FALSE values, the outputs focus the TRUE category.
-
prop: For stacked bar charts, displaying row percentages instead of total percentages gives a direct visual comparison of groups.
-
limits: The scale limits are automatically guessed by the package functions (work in progress). Use the limits-parameter to manually fix any misleading graphs.
-
title: All plots usually get a title derived from the column attributes or column names. Set to FALSE to suppress the title or provide a title of your choice as a character value.
-
labels: Labels are extracted from the column attributes. Set to FALSE to output bare column names and values.
-
numbers: Set the numbers parameter to “n” (frequency), “p” (percentage) or c(“n”,“p”). To prevent cluttering and overlaps, numbers are only plotted on bars larger than 5%.
-
width: When comparing groups by row of column percentages, by default, the bar or column width reflects the number of cases. You can disable this behavior by setting width to FALSE.
Usage
plot_counts(data, cols, cross = NULL, metric = FALSE, clean = TRUE, ...)
Arguments
data |
A data frame. |
cols |
A tidy column selection, e.g. a single column (without quotes) or multiple columns selected by methods such as starts_with(). |
cross |
Optional, a grouping column. The column name without quotes. |
metric |
When crossing variables, the cross column parameter can contain categorical or metric values. By default, the cross column selection is treated as categorical data. Set metric to TRUE, to treat it as metric and calculate correlations. |
clean |
Prepare data by data_clean. |
... |
Other parameters passed to the appropriate plot function. |
Value
A ggplot2 plot object.
Examples
library(volker)
data <- volker::chatgpt
plot_counts(data, sd_gender)
Output frequencies for multiple variables
Description
Output frequencies for multiple variables
Usage
plot_counts_items(
data,
cols,
category = NULL,
ordered = NULL,
ci = FALSE,
limits = NULL,
numbers = NULL,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
category |
The value FALSE will force to plot all categories. A character value will focus a selected category. When NULL, in case of boolean values, only the TRUE category is plotted. |
ordered |
Values can be nominal (0) or ordered ascending (1) descending (-1). By default (NULL), the ordering is automatically detected. An appropriate color scale should be choosen depending on the ordering. For unordered values, colors from VLKR_FILLDISCRETE are used. For ordered values, shades of the VLKR_FILLGRADIENT option are used. |
ci |
Whether to plot error bars for 95% confidence intervals. |
limits |
The scale limits, autoscaled by default.
Set to |
numbers |
The values to print on the bars: "n" (frequency), "p" (percentage) or both. |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_counts. |
Value
A ggplot object.
Examples
library(volker)
data <- volker::chatgpt
plot_counts_items(data, starts_with("cg_adoption_"))
Plot percent shares of multiple items compared by a metric variable split into groups
Description
Plot percent shares of multiple items compared by a metric variable split into groups
Usage
plot_counts_items_cor(
data,
cols,
cross,
category = NULL,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
A metric column that will be split into groups at the median. |
category |
Summarizing multiple items (the cols parameter) by group requires a focus category. By default, for logical column types, only TRUE values are counted. For other column types, the first category is counted. To override the default behavior, provide a vector of values in the dataset or labels from the codebook. |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_counts. |
Value
A ggplot object.
Examples
library(volker)
data <- volker::chatgpt
plot_counts_items_cor(
data, starts_with("cg_adoption_"), sd_age,
category=c("agree","strongly agree")
)
plot_counts_items_cor(
data, starts_with("cg_adoption_"), sd_age,
category=c(4,5)
)
Correlation of categorical items with metric items
Description
Not yet implemented. The future will come.
Usage
plot_counts_items_cor_items(
data,
cols,
cross,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
Tidyselect item variables (e.g. starts_with...). |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_counts. |
Value
A ggplot object.
Plot percent shares of multiple items compared by groups
Description
Plot percent shares of multiple items compared by groups
Usage
plot_counts_items_grouped(
data,
cols,
cross,
category = NULL,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
The column holding groups to compare. |
category |
Summarizing multiple items (the cols parameter) by group requires a focus category. By default, for logical column types, only TRUE values are counted. For other column types, the first category is counted. To override the default behavior, provide a vector of values in the dataset or labels from the codebook. |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_counts. |
Value
A ggplot object.
Examples
library(volker)
data <- volker::chatgpt
plot_counts_items_grouped(
data, starts_with("cg_adoption_"), adopter,
category=c("agree","strongly agree")
)
plot_counts_items_grouped(
data, starts_with("cg_adoption_"), adopter,
category=c(4,5)
)
Correlation of categorical items with categorical items
Description
Not yet implemented. The future will come.
Usage
plot_counts_items_grouped_items(
data,
cols,
cross,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
Tidyselect item variables (e.g. starts_with...). |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_counts. |
Value
A ggplot object.
Plot the frequency of values in one column
Description
Plot the frequency of values in one column
Usage
plot_counts_one(
data,
col,
category = NULL,
ci = FALSE,
limits = NULL,
numbers = NULL,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
col |
The column holding values to count. |
category |
The value FALSE will force to plot all categories. A character value will focus a selected category. When NULL, in case of boolean values, only the TRUE category is plotted. |
ci |
Whether to plot error bars for 95% confidence intervals. |
limits |
The scale limits, autoscaled by default.
Set to |
numbers |
The values to print on the bars: "n" (frequency), "p" (percentage) or both. |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_counts. |
Value
A ggplot object.
Examples
library(volker)
data <- volker::chatgpt
plot_counts_one(data, sd_gender)
Plot frequencies cross tabulated with a metric column that will be split into groups
Description
Plot frequencies cross tabulated with a metric column that will be split into groups
Usage
plot_counts_one_cor(
data,
col,
cross,
category = NULL,
prop = "total",
limits = NULL,
ordered = NULL,
numbers = NULL,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
col |
The column holding factor values. |
cross |
A metric column that will be split into groups at the median. |
category |
The value FALSE will force to plot all categories. A character value will focus a selected category. When NULL, in case of boolean values, only the TRUE category is plotted. |
prop |
The basis of percent calculation: "total" (the default), "rows" or "cols". Plotting row or column percentages results in stacked bars that add up to 100%. Whether you set rows or cols determines which variable is in the legend (fill color) and which on the vertical scale. |
limits |
The scale limits, autoscaled by default.
Set to |
ordered |
The values of the cross column can be nominal (0), ordered ascending (1), or descending (-1). By default (NULL), the ordering is automatically detected. An appropriate color scale should be chosen depending on the ordering. For unordered values, colors from VLKR_FILLDISCRETE are used. For ordered values, shades of the VLKR_FILLGRADIENT option are used. |
numbers |
The numbers to print on the bars: "n" (frequency), "p" (percentage) or both. |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_counts. |
Value
A ggplot object.
Examples
library(volker)
data <- volker::chatgpt
plot_counts_one_cor(data, adopter, sd_age)
Plot frequencies cross tabulated with a grouping column
Description
Plot frequencies cross tabulated with a grouping column
Usage
plot_counts_one_grouped(
data,
col,
cross,
category = NULL,
prop = "total",
width = NULL,
limits = NULL,
ordered = NULL,
numbers = NULL,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
col |
The column holding factor values. |
cross |
The column holding groups to split. |
category |
The value FALSE will force to plot all categories. A character value will focus a selected category. When NULL, in case of boolean values, only the TRUE category is plotted. |
prop |
The basis of percent calculation: "total" (the default), "rows" or "cols". Plotting row or column percentages results in stacked bars that add up to 100%. Whether you set rows or cols determines which variable is in the legend (fill color) and which on the vertical scale. |
width |
By default, when setting the prop parameter to "rows" or "cols", the bar or column width reflects the number of cases. You can disable this behavior by setting width to FALSE. |
limits |
The scale limits, autoscaled by default.
Set to |
ordered |
The values of the cross column can be nominal (0), ordered ascending (1), or descending (-1). By default (NULL), the ordering is automatically detected. An appropriate color scale should be chosen depending on the ordering. For unordered values, colors from VLKR_FILLDISCRETE are used. For ordered values, shades of the VLKR_FILLGRADIENT option are used. |
numbers |
The numbers to print on the bars: "n" (frequency), "p" (percentage) or both. |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_counts. |
Value
A ggplot object.
Examples
library(volker)
data <- volker::chatgpt
plot_counts_one_grouped(data, adopter, sd_gender)
Output a plot with distribution parameters such as the mean values
Description
The plot type depends on the number of selected columns:
One metric column: see plot_metrics_one
Multiple metric columns: see plot_metrics_items
Group comparisons:
One metric column and one grouping column: see plot_metrics_one_grouped
Multiple metric columns and one grouping column: see plot_metrics_items_grouped
Multiple metric columns and multiple grouping columns: see plot_metrics_items_grouped_items (not yet implemented)
By default, if you provide two column selections, the second selection is treated as categorical. Setting the metric-parameter to TRUE will call the appropriate functions for correlation analysis:
Two metric columns: see plot_metrics_one_cor
Multiple metric columns and one metric column : see plot_metrics_items_cor
Two metric column selections: see plot_metrics_items_cor_items
Parameters that may be passed to the metric functions (see the respective function help):
-
ci: Plot confidence intervals for means or correlation coefficients.
-
box: Visualise the distribution by adding boxplots.
-
log: In scatter plots, you can use a logarithmic scale. Be aware, that zero values will be omitted because their log value is undefined.
-
method: By default, correlations are calculated using Pearson’s R. You can choose Spearman’s Rho with the methods-parameter.
-
limits: The scale limits are automatically guessed by the package functions (work in progress). Use the limits-parameter to manually fix any misleading graphs.
-
title: All plots usually get a title derived from the column attributes or column names. Set to FALSE to suppress the title or provide a title of your choice as a character value.
-
labels: Labels are extracted from the column attributes. Set to FALSE to output bare column names and values.
-
numbers: Controls whether to display correlation coefficients on the plot.
Usage
plot_metrics(data, cols, cross = NULL, metric = FALSE, clean = TRUE, ...)
Arguments
data |
A data frame. |
cols |
A tidy column selection, e.g. a single column (without quotes) or multiple columns selected by methods such as starts_with(). |
cross |
Optional, a grouping column (without quotes). |
metric |
When crossing variables, the cross column parameter can contain categorical or metric values. By default, the cross column selection is treated as categorical data. Set metric to TRUE, to treat it as metric and calculate correlations. |
clean |
Prepare data by data_clean. |
... |
Other parameters passed to the appropriate plot function. |
Value
A ggplot object.
Examples
library(volker)
data <- volker::chatgpt
plot_metrics(data, sd_age)
Output averages for multiple variables
Description
Output averages for multiple variables
Usage
plot_metrics_items(
data,
cols,
ci = FALSE,
box = FALSE,
limits = NULL,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
ci |
Whether to plot the 95% confidence interval of the mean. |
box |
Whether to add boxplots. |
limits |
The scale limits. Set NULL to extract limits from the labels. NOT IMPLEMENTED YET. |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_metrics. |
Value
A ggplot object.
Examples
library(volker)
data <- volker::chatgpt
plot_metrics_items(data, starts_with("cg_adoption_"))
Multiple items correlated with one metric variable
Description
Multiple items correlated with one metric variable
Usage
plot_metrics_items_cor(
data,
cols,
cross,
ci = FALSE,
method = "pearson",
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
The column to correlate. |
ci |
Whether to plot confidence intervals of the correlation coefficient. |
method |
The method of correlation calculation, pearson = Pearson's R, spearman = Spearman's rho. |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_metrics. |
Value
A ggplot object.
Examples
library(volker)
data <- volker::chatgpt
plot_metrics_items_cor(data, starts_with("use_"), sd_age)
Heatmap for correlations between multiple items
Description
Heatmap for correlations between multiple items
Usage
plot_metrics_items_cor_items(
data,
cols,
cross,
method = "pearson",
numbers = FALSE,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
Tidyselect item variables to correlate (e.g. starts_with...). |
method |
The method of correlation calculation, pearson = Pearson's R, spearman = Spearman's rho. |
numbers |
Controls whether to display correlation coefficients on the plot. |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_metrics. |
Value
A ggplot object.
Examples
library(volker)
data <- volker::chatgpt
plot_metrics_items_cor_items(data, starts_with("cg_adoption_adv"), starts_with("use_"))
Output averages for multiple variables compared by a grouping variable
Description
Output averages for multiple variables compared by a grouping variable
Usage
plot_metrics_items_grouped(
data,
cols,
cross,
limits = NULL,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
The column holding groups to compare. |
limits |
The scale limits. Set NULL to extract limits from the labels. |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_metrics. |
Value
A ggplot object.
Examples
library(volker)
data <- volker::chatgpt
plot_metrics_items_grouped(data, starts_with("cg_adoption_"), sd_gender)
Correlation of metric items with categorical items
Description
Not yet implemented. The future will come.
Usage
plot_metrics_items_grouped_items(
data,
cols,
cross,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
Tidyselect item variables (e.g. starts_with...) |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_metrics. |
Value
A ggplot object.
Output a density plot for a single metric variable
Description
Output a density plot for a single metric variable
Usage
plot_metrics_one(
data,
col,
ci = FALSE,
box = FALSE,
limits = NULL,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
col |
The column holding metric values. |
ci |
Whether to plot the confidence interval. |
box |
Whether to add a boxplot. |
limits |
The scale limits. Set NULL to extract limits from the label. |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_metrics. |
Value
A ggplot object.
Examples
library(volker)
data <- volker::chatgpt
plot_metrics_one(data, sd_age)
Correlate two items
Description
Correlate two items
Usage
plot_metrics_one_cor(
data,
col,
cross,
limits = NULL,
log = FALSE,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
col |
The first column holding metric values. |
cross |
The second column holding metric values. |
limits |
The scale limits, a list with x and y components, e.g. |
log |
Whether to plot log scales. |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_metrics. |
Value
A ggplot object.
Examples
library(volker)
data <- volker::chatgpt
plot_metrics_one_cor(data, use_private, sd_age)
Output averages for multiple variables
Description
Output averages for multiple variables
Usage
plot_metrics_one_grouped(
data,
col,
cross,
ci = FALSE,
box = FALSE,
limits = NULL,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
col |
The column holding metric values. |
cross |
The column holding groups to compare. |
ci |
Whether to add error bars with 95% confidence intervals. |
box |
Whether to add boxplots. |
limits |
The scale limits. Set NULL to extract limits from the labels. |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_metrics. |
Value
A ggplot object.
Examples
library(volker)
data <- volker::chatgpt
plot_metrics_one_grouped(data, sd_age, sd_gender)
Prepare the scale attribute values
Description
Prepare the scale attribute values
Usage
prepare_scale(data)
Arguments
data |
A tibble with a scale attribute. |
Value
A named list or NULL.
Printing method for volker lists
Description
Printing method for volker lists
Usage
## S3 method for class 'vlkr_list'
print(x, ...)
Arguments
x |
The volker list. |
... |
Further parameters passed to print. |
Value
No return value.
Examples
library(volker)
data <- volker::chatgpt
rp <- report_metrics(data, sd_age, sd_gender, effect = TRUE)
print(rp)
Printing method for volker plots
Description
Printing method for volker plots
Usage
## S3 method for class 'vlkr_plt'
print(x, ...)
## S3 method for class 'vlkr_plt'
plot(x, ...)
Arguments
x |
The volker plot. |
... |
Further parameters passed to print(). |
Value
No return value.
Examples
library(volker)
data <- volker::chatgpt
pl <- plot_metrics(data, sd_age)
print(pl)
Printing method for volker reports
Description
Printing method for volker reports
Usage
## S3 method for class 'vlkr_rprt'
print(x, ...)
Arguments
x |
The volker report object. |
... |
Further parameters passed to print. |
Value
No return value.
Examples
library(volker)
data <- volker::chatgpt
rp <- report_metrics(data, sd_age)
print(rp)
Printing method for volker tables.
Description
Printing method for volker tables.
Usage
## S3 method for class 'vlkr_tbl'
print(x, ...)
Arguments
x |
The volker table. |
... |
Further parameters passed to print(). |
Value
No return value.
Examples
library(volker)
data <- volker::chatgpt
tb <- tab_metrics(data, sd_age)
print(tb)
Create table and plot for categorical variables
Description
Depending on your column selection, different types of plots and tables are generated. See plot_counts and tab_counts.
Usage
report_counts(
data,
cols,
cross = NULL,
metric = FALSE,
index = FALSE,
effect = FALSE,
numbers = NULL,
title = TRUE,
close = TRUE,
clean = TRUE,
...
)
Arguments
data |
A data frame. |
cols |
A tidy column selection, e.g. a single column (without quotes) or multiple columns selected by methods such as starts_with(). |
cross |
Optional, a grouping column (without quotes). |
metric |
When crossing variables, the cross column parameter can contain categorical or metric values. By default, the cross column selection is treated as categorical data. Set metric to TRUE, to treat it as metric and calculate correlations. |
index |
When the cols contain items on a metric scale (as determined by get_direction), an index will be calculated using the 'psych' package. Set to FALSE to suppress index generation. |
effect |
Whether to report statistical tests and effect sizes. See effect_counts for further parameters. |
numbers |
The numbers to print on the bars: "n" (frequency), "p" (percentage) or both. Set to NULL to remove numbers. |
title |
A character providing the heading or TRUE (default) to output a heading. Classes for tabset pills will be added. |
close |
Whether to close the last tab (default value TRUE) or to keep it open. Keep it open to add further custom tabs by adding headers on the fifth level in Markdown (e.g. ##### Method). |
clean |
Prepare data by data_clean. |
... |
Parameters passed to the plot_counts and tab_counts and effect_counts functions. |
Details
For item batteries, an index is calculated and reported. When used in combination with the Markdown-template "html_report", the different parts of the report are grouped under a tabsheet selector.
Value
A volker report object.
Examples
library(volker)
data <- volker::chatgpt
report_counts(data, sd_gender)
Create table and plot for metric variables
Description
Depending on your column selection, different types of plots and tables are generated. See plot_metrics and tab_metrics.
Usage
report_metrics(
data,
cols,
cross = NULL,
metric = FALSE,
...,
index = FALSE,
factors = FALSE,
clusters = FALSE,
effect = FALSE,
title = TRUE,
close = TRUE,
clean = TRUE
)
Arguments
data |
A data frame. |
cols |
A tidy column selection, e.g. a single column (without quotes) or multiple columns selected by methods such as starts_with(). |
cross |
Optional, a grouping or correlation column (without quotes). |
metric |
When crossing variables, the cross column parameter can contain categorical or metric values. By default, the cross column selection is treated as categorical data. Set metric to TRUE, to treat it as metric and calculate correlations. |
... |
Parameters passed to the plot_metrics and tab_metrics and effect_metrics functions. |
index |
When the cols contain items on a metric scale (as determined by get_direction), an index will be calculated using the 'psych' package. Set to FALSE to suppress index generation. |
factors |
The number of factors to calculate. Set to FALSE to suppress factor analysis. Set to TRUE to output a scree plot and automatically choose the number of factors. When the cols contain items on a metric scale (as determined by get_direction), factors will be calculated using the 'psych' package. See add_factors. |
clusters |
The number of clusters to calculate. Cluster are determined using kmeans after scaling the items. Set to FALSE to suppress cluster analysis. Set to TRUE to output a scree plot and automatically choose the number of clusters based on the elbow criterion. See add_clusters. |
effect |
Whether to report statistical tests and effect sizes. See effect_counts for further parameters. |
title |
A character providing the heading or TRUE (default) to output a heading. Classes for tabset pills will be added. |
close |
Whether to close the last tab (default value TRUE) or to keep it open. Keep it open to add further custom tabs by adding headers on the fifth level in Markdown (e.g. ##### Method). |
clean |
Prepare data by data_clean. |
Details
For item batteries, an index is calculated and reported. When used in combination with the Markdown-template "html_report", the different parts of the report are grouped under a tabsheet selector.
Value
A volker report object.
Examples
library(volker)
data <- volker::chatgpt
report_metrics(data, sd_age)
Select function
Description
See dplyr::select
for details.
A skimmer for boxplot generation
Description
Returns a five point summary, mean and sd, items count and alpha for scales added by add_index(). Additionally, the whiskers defined by the minimum respective maximum value within 1.5 * iqr are calculated. Outliers are returned in a list column.
Usage
skim_boxplot(data, ..., .data_name = NULL)
Calculate a metric by groups
Description
Calculate a metric by groups
Usage
skim_grouped(data, cols, cross, value = "numeric.mean", labels = TRUE)
Arguments
data |
A tibble. |
cols |
The item columns that hold the values to summarize. |
cross |
The column holding groups to compare. |
value |
The metric to extract from the skim result, e.g. numeric.mean or numeric.sd. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
Value
A tibble with each item in a row, a total column and columns for all groups.
A reduced skimmer for metric variables Returns a five point summary, mean and sd, items count and alpha for scales added by add_index()
Description
A reduced skimmer for metric variables Returns a five point summary, mean and sd, items count and alpha for scales added by add_index()
Usage
skim_metrics(data, ..., .data_name = NULL)
Value
A skimmer, see skim_with
Examples
library(volker)
data <- volker::chatgpt
skim_metrics(data)
Select variables by their prefix
Description
See tidyselect::starts_with
for details.
Output a frequency table
Description
The type of frequency table depends on the number of selected columns:
One categorical column: see tab_counts_one
Multiple categorical columns: see tab_counts_items
Cross tabulations:
One categorical column and one grouping column: see tab_counts_one_grouped
Multiple categorical columns and one grouping column: see tab_counts_items_grouped
Multiple categorical columns and multiple grouping columns: see tab_counts_items_grouped_items (not yet implemented)
By default, if you provide two column selections, the second column is treated as categorical. Setting the metric-parameter to TRUE will call the appropriate functions for correlation analysis:
One categorical column and one metric column: see tab_counts_one_cor
Multiple categorical columns and one metric column: see tab_counts_items_cor
Multiple categorical columns and multiple metric columns: tab_counts_items_cor_items (not yet implemented)
Parameters that may be passed to specific count functions:
-
ci: Add confidence intervals to proportions.
-
percent: Frequency tables show percentages by default. Set to FALSE to get raw proportions.
-
prop: For cross tables you can choose between total, row or column percentages.
-
values: The values to output: n (frequency) or p (percentage) or both (the default).
-
category: When you have multiple categories in a column, you can focus one of the categories to simplify the plots. By default, if a column has only TRUE and FALSE values, the outputs focus the TRUE category.
-
labels: Labels are extracted from the column attributes. Set to FALSE to output bare column names and values.
Usage
tab_counts(data, cols, cross = NULL, metric = FALSE, clean = TRUE, ...)
Arguments
data |
A data frame. |
cols |
A tidy column selection, e.g. a single column (without quotes) or multiple columns selected by methods such as starts_with(). |
cross |
Optional, a grouping column. The column name without quotes. |
metric |
When crossing variables, the cross column parameter can contain categorical or metric values. By default, the cross column selection is treated as categorical data. Set metric to TRUE, to treat it as metric and calculate correlations. |
clean |
Prepare data by data_clean. |
... |
Other parameters passed to the appropriate table function. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
tab_counts(data, sd_gender)
Output frequencies for multiple variables
Description
Output frequencies for multiple variables
Usage
tab_counts_items(
data,
cols,
ci = FALSE,
percent = TRUE,
values = c("n", "p"),
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
ci |
Whether to compute 95% confidence intervals. |
percent |
Set to FALSE to prevent calculating percents from proportions. |
values |
The values to output: n (frequency) or p (percentage) or both (the default). |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from tab_counts. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
tab_counts_items(data, starts_with("cg_adoption_"))
Compare the values in multiple items by a metric column that will be split into groups
Description
Compare the values in multiple items by a metric column that will be split into groups
Usage
tab_counts_items_cor(
data,
cols,
cross,
category = NULL,
split = NULL,
percent = TRUE,
values = c("n", "p"),
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
A metric column that will be split into groups at the median value. |
category |
Summarizing multiple items (the cols parameter) by group requires a focus category. By default, for logical column types, only TRUE values are counted. For other column types, the first category is counted. Accepts both character and numeric vectors to override default counting behavior. |
split |
Not implemented yet. |
percent |
Proportions are formatted as percent by default. Set to FALSE to get bare proportions. |
values |
The values to output: n (frequency) or p (percentage) or both (the default). |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_counts. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
tab_counts_items_cor(
data, starts_with("cg_adoption_"), sd_age,
category=c("agree", "strongly agree")
)
Correlation of categorical items with metric items
Description
Not yet implemented. The future will come.
Usage
tab_counts_items_cor_items(
data,
cols,
cross,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
Tidyselect item variables (e.g. starts_with...). |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_counts. |
Value
A volker tibble.
Compare the values in multiple items by a grouping column
Description
Compare the values in multiple items by a grouping column
Usage
tab_counts_items_grouped(
data,
cols,
cross,
category = NULL,
percent = TRUE,
values = c("n", "p"),
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
The column holding groups to compare. |
category |
Summarizing multiple items (the cols parameter) by group requires a focus category. By default, for logical column types, only TRUE values are counted. For other column types, the first category is counted. Accepts both character and numeric vectors to override default counting behavior. |
percent |
Proportions are formatted as percent by default. Set to FALSE to get bare proportions. |
values |
The values to output: n (frequency) or p (percentage) or both (the default). |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_counts. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
tab_counts_items_grouped(
data, starts_with("cg_adoption_"), adopter,
category=c("agree", "strongly agree")
)
Correlation of categorical items with categorical items
Description
Not yet implemented. The future will come.
Usage
tab_counts_items_grouped_items(
data,
cols,
cross,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
Tidyselect item variables (e.g. starts_with...). |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_counts. |
Value
A volker tibble.
Output a frequency table for the values in one column
Description
Output a frequency table for the values in one column
Usage
tab_counts_one(
data,
col,
ci = FALSE,
percent = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
col |
The column holding values to count. |
ci |
Whether to compute 95% confidence intervals using |
percent |
Proportions are formatted as percent by default. Set to FALSE to get bare proportions. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from tab_counts. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
tab_counts_one(data, sd_gender)
Count values by a metric column that will be split into groups
Description
Count values by a metric column that will be split into groups
Usage
tab_counts_one_cor(
data,
col,
cross,
prop = "total",
percent = TRUE,
values = c("n", "p"),
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
col |
The column holding factor values. |
cross |
The metric column that will be split into groups at the median. |
prop |
The basis of percent calculation: "total" (the default), "cols", or "rows". |
percent |
Proportions are formatted as percent by default. Set to FALSE to get bare proportions. |
values |
The values to output: n (frequency) or p (percentage) or both (the default). |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from tab_counts. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
tab_counts_one_cor(data, adopter, sd_age)
Output frequencies cross tabulated with a grouping column
Description
Output frequencies cross tabulated with a grouping column
Usage
tab_counts_one_grouped(
data,
col,
cross,
prop = "total",
percent = TRUE,
values = c("n", "p"),
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
col |
The column holding factor values. |
cross |
The column holding groups to split. |
prop |
The basis of percent calculation: "total" (the default), "cols", or "rows". |
percent |
Proportions are formatted as percent by default. Set to FALSE to get bare proportions. |
values |
The values to output: n (frequency) or p (percentage) or both (the default). |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from tab_counts. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
tab_counts_one_grouped(data, adopter, sd_gender)
Output a table with distribution parameters
Description
The table type depends on the number of selected columns:
One metric column: see tab_metrics_one
Multiple metric columns: see tab_metrics_items
Group comparisons:
One metric column and one grouping column: see tab_metrics_one_grouped
Multiple metric columns and one grouping column: see tab_metrics_items_grouped
Multiple metric columns and multiple grouping columns: see tab_metrics_items_grouped_items (not yet implemented)
By default, if you provide two column selections, the second column is treated as categorical. Setting the metric-parameter to TRUE will call the appropriate functions for correlation analysis:
Two metric columns: see tab_metrics_one_cor
Multiple metric columns and one metric column: see tab_metrics_items_cor
Two metric column selections: see tab_metrics_items_cor_items
Parameters that may be passed to specific metric functions:
-
ci: Add confidence intervals for means or correlation coefficients.
-
values: The output metrics, mean (m), the standard deviation (sd) or both (the default).
-
digits: Tables containing means and standard deviations by default round values to one digit. Increase the number to show more digits
-
method: By default, correlations are calculated using Pearson’s R. You can choose Spearman’s Rho with the methods-parameter.
-
labels: Labels are extracted from the column attributes. Set to FALSE to output bare column names and values.
Usage
tab_metrics(data, cols, cross = NULL, metric = FALSE, clean = TRUE, ...)
Arguments
data |
A data frame. |
cols |
A tidy column selection, e.g. a single column (without quotes) or multiple columns selected by methods such as starts_with(). |
cross |
Optional, a grouping column (without quotes). |
metric |
When crossing variables, the cross column parameter can contain categorical or metric values. By default, the cross column selection is treated as categorical data. Set metric to TRUE, to treat it as metric and calculate correlations. |
clean |
Prepare data by data_clean. |
... |
Other parameters passed to the appropriate table function. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
tab_metrics(data, sd_age)
Output a five point summary table for multiple items
Description
Output a five point summary table for multiple items
Usage
tab_metrics_items(
data,
cols,
ci = FALSE,
digits = 1,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
cols |
The columns holding metric values. |
ci |
Whether to compute confidence intervals of the mean. |
digits |
The number of digits to print. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from tab_metrics. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
tab_metrics_items(data, starts_with("cg_adoption_"))
Output a correlation table for item battery and one metric variable
Description
Usage
tab_metrics_items_cor(
data,
cols,
cross,
method = "pearson",
digits = 2,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
cols |
The source columns. |
cross |
The target columns or NULL to calculate correlations within the source columns. |
method |
The output metrics, pearson = Pearson's R, spearman = Spearman's rho. |
digits |
The number of digits to print. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from tab_metrics. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
tab_metrics_items_cor(
data,
starts_with("cg_adoption_adv"),
sd_age,
metric = TRUE
)
Output a correlation table for item battery and item battery
Description
Usage
tab_metrics_items_cor_items(
data,
cols,
cross,
method = "pearson",
digits = 2,
ci = FALSE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
cols |
The source columns. |
cross |
The target columns or NULL to calculate correlations within the source columns. |
method |
The output metrics, pearson = Pearson's R, spearman = Spearman's rho. |
digits |
The number of digits to print. |
ci |
Whether to calculate 95% confidence intervals of the correlation coefficient. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from tab_metrics. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
tab_metrics_items_cor_items(
data,
starts_with("cg_adoption_adv"),
starts_with("use"),
metric = TRUE
)
Output the means for groups in one or multiple columns
Description
Output the means for groups in one or multiple columns
Usage
tab_metrics_items_grouped(
data,
cols,
cross,
digits = 1,
values = c("m", "sd"),
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
cols |
The item columns that hold the values to summarize. |
cross |
The column holding groups to compare. |
digits |
The number of digits to print. |
values |
The output metrics, mean (m), the standard deviation (sd) or both (the default). |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from tab_metrics. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
tab_metrics_items_grouped(data, starts_with("cg_adoption_"), sd_gender)
Correlation of metric items with categorical items
Description
Not yet implemented. The future will come.
Usage
tab_metrics_items_grouped_items(
data,
cols,
cross,
title = TRUE,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble containing item measures. |
cols |
Tidyselect item variables (e.g. starts_with...). |
cross |
Tidyselect item variables (e.g. starts_with...) |
title |
If TRUE (default) shows a plot title derived from the column labels. Disable the title with FALSE or provide a custom title as character value. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from plot_metrics. |
Value
A volker tibble.
Output a five point summary table for the values in multiple columns
Description
Output a five point summary table for the values in multiple columns
Usage
tab_metrics_one(
data,
col,
ci = FALSE,
digits = 1,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
col |
The columns holding metric values. |
ci |
Whether to calculate 95% confidence intervals of the mean. |
digits |
The number of digits to print. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from tab_metrics. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
tab_metrics_one(data, sd_age)
Correlate two columns
Description
Correlate two columns
Usage
tab_metrics_one_cor(
data,
col,
cross,
method = "pearson",
ci = FALSE,
digits = 2,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
col |
The first column holding metric values. |
cross |
The second column holding metric values. |
method |
The output metrics, TRUE or pearson = Pearson's R, spearman = Spearman's rho |
ci |
Whether to output confidence intervals. |
digits |
The number of digits to print. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from tab_counts. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
tab_metrics_one_cor(data, use_private, sd_age)
Output a five point summary for groups
Description
Output a five point summary for groups
Usage
tab_metrics_one_grouped(
data,
col,
cross,
ci = FALSE,
digits = 1,
labels = TRUE,
clean = TRUE,
...
)
Arguments
data |
A tibble. |
col |
The column holding metric values. |
cross |
The column holding groups to compare. |
ci |
Whether to output 95% confidence intervals. |
digits |
The number of digits to print. |
labels |
If TRUE (default) extracts labels from the attributes, see codebook. |
clean |
Prepare data by data_clean. |
... |
Placeholder to allow calling the method with unused parameters from tab_metrics. |
Value
A volker tibble.
Examples
library(volker)
data <- volker::chatgpt
tab_metrics_one_grouped(data, sd_age, sd_gender)
Get, set, and modify the active ggplot theme
Description
See ggplot2::theme_set
for details.
Define a default theme for volker plots
Description
Set ggplot colors, sizes and layout parameters.
Usage
theme_vlkr(
base_size = 11,
base_color = "black",
base_fill = VLKR_FILLDISCRETE,
base_gradient = VLKR_FILLGRADIENT
)
Arguments
base_size |
Base font size. |
base_color |
Base font color. |
base_fill |
A list of fill color sets or at least one fill color set. Example:
|
base_gradient |
A color vector used for creating gradient fill colors, e.g. in stacked bar plots. |
Details
Value
A theme function.
Examples
library(volker)
library(ggplot2)
data <- volker::chatgpt
theme_set(theme_vlkr(base_size=15, base_fill = list("red")))
plot_counts(data, sd_gender)
Tidy tibbles
Description
See tibble::tibble
for details.
Tidy lm results, replace categorical parameter names by their levels and add the reference level
Description
Tidy lm results, replace categorical parameter names by their levels and add the reference level
Usage
tidy_lm_levels(fit)
Arguments
fit |
Result of a |
Value
A tibble with regression parameters.
Author(s)
Created with the help of ChatGPT.
Tidy tribbles
Description
See tibble::tribble
for details.
Remove trailing zeros and trailing or leading whitespaces, colons, hyphens and underscores
Description
Remove trailing zeros and trailing or leading whitespaces, colons, hyphens and underscores
Usage
trim_label(x)
Arguments
x |
A character value. |
Value
The trimmed character value.
Remove a prefix from a character vector or a factor
Description
If the resulting character values would be empty, the prefix is returned. At the end, all items in the vector are trimmed using trim_label.
Usage
trim_prefix(x, prefix = TRUE)
Arguments
x |
A character or factor vector. |
prefix |
The prefix. Set to TRUE to first extract the prefix. |
Details
If x is a factor, the order of factor levels is retained.
Value
The trimmed character or factor vector.
Truncate labels
Description
Truncate labels that exceed a specified maximum length.
Usage
trunc_labels(x, max_length = 20)
Arguments
x |
A character vector. |
max_length |
Maximum length, default is 20. The ellipsis "..." is appended to shortened labels. |
Value
A character vector with truncated labels.
Interpolate an alpha value based on case numbers
Description
Interpolate an alpha value based on case numbers
Usage
vlkr_alpha_interpolated(
n,
n_min = 20,
n_max = 100,
alpha_min = VLKR_POINT_ALPHA,
alpha_max = 1
)
Arguments
n |
Number of cases |
n_min |
The case number where the minimum alpha value starts |
n_max |
The case number where the maximum alpha value ends |
alpha_min |
The minimum alpha value |
alpha_max |
The maximum alpha value |
Value
A value between the minimum and the maximum alpha value
Get colors for discrete scales
Description
If the option ggplot2.discrete.fill is set, gets color values from the first list item that has enough colors and reverses them to start filling from the left in grouped bar charts.
Usage
vlkr_colors_discrete(n)
Arguments
n |
Number of colors. |
Details
Falls back to scale_fill_hue().
Value
A vector of colors.
Get colors for polarized scales
Description
Creates a gradient scale based on VLKR_FILLPOLARIZED.
Usage
vlkr_colors_polarized(n = NULL)
Arguments
n |
Number of colors or NULL to get the raw colors from the config |
Value
A vector of colors.
Get colors for sequential scales
Description
Creates a gradient scale based on VLKR_FILLGRADIENT.
Usage
vlkr_colors_sequential(n = NULL)
Arguments
n |
Number of colors or NULL to get the raw colors from the config |
Value
A vector of colors.
Wrap a string
Description
Wrap a string
Usage
wrap_label(x, width = 40)
Arguments
x |
A character vector. |
width |
The number of chars after which to break. |
Value
A character vector with wrapped strings.
Combine two identically shaped data frames by adding values of each column from the second data frame into the corresponding column in the first dataframe using parentheses
Description
Combine two identically shaped data frames by adding values of each column from the second data frame into the corresponding column in the first dataframe using parentheses
Usage
zip_tables(x, y, newline = TRUE, brackets = FALSE)
Arguments
x |
The first data frame. |
y |
The second data frame. |
newline |
Whether to add a new line character between the values (default: TRUE). |
brackets |
Whether to set the secondary values in brackets (default: FALSE). |
Value
A combined data frame.