--- title: "Workflow Patterns" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Workflow Patterns} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(ReproStat) set.seed(20260324) ``` ## Why workflow patterns help Most users do not need every feature in ReproStat at once. They usually need a small number of reliable patterns that fit real analysis situations. This article shows several practical ways to use the package. ## Pattern 1: Standard regression stability check Use this when you already have a preferred regression specification and want to know how stable its outputs are. ```{r pattern1} diag_obj <- run_diagnostics( mpg ~ wt + hp + disp, data = mtcars, B = 200, method = "bootstrap" ) reproducibility_index(diag_obj) selection_stability(diag_obj) ``` Recommended outputs to review: - `reproducibility_index()` - `selection_stability()` - `plot_stability(diag_obj, "selection")` ## Pattern 2: Sample-composition sensitivity Use this when you are worried that the model depends too strongly on exactly which observations appear in the sample. ```{r pattern2} diag_sub <- run_diagnostics( mpg ~ wt + hp + disp, data = mtcars, B = 200, method = "subsample", frac = 0.75 ) reproducibility_index(diag_sub) ``` This is often useful for: - smaller datasets - observational studies - analyses where influence of sample composition is a concern ## Pattern 3: Measurement-noise stress test Use noise perturbation when predictors may be measured with minor error and you want to know whether the fitted result is sensitive to that noise. ```{r pattern3} diag_noise <- run_diagnostics( mpg ~ wt + hp + disp, data = mtcars, B = 150, method = "noise", noise_sd = 0.05 ) reproducibility_index(diag_noise) prediction_stability(diag_noise)$mean_variance ``` This does not replace a full measurement-error model, but it gives a useful practical stress test. ## Pattern 4: Logistic model reproducibility Use a GLM backend when the response is binary. ```{r pattern4} diag_glm <- run_diagnostics( am ~ wt + hp + qsec, data = mtcars, B = 150, backend = "glm", family = stats::binomial() ) reproducibility_index(diag_glm) ``` This pattern is helpful when you want to know whether the classification-style conclusion is stable under perturbation, not just whether the original fit looks significant. ## Pattern 5: Robust regression with outlier concern If you suspect outliers or heavy tails are affecting the OLS result, compare a robust backend. ```{r pattern5, eval = requireNamespace("MASS", quietly = TRUE)} if (requireNamespace("MASS", quietly = TRUE)) { diag_rlm <- run_diagnostics( mpg ~ wt + hp + disp, data = mtcars, B = 150, backend = "rlm" ) reproducibility_index(diag_rlm) } ``` This is often a useful companion analysis rather than a full replacement. ## Pattern 6: Penalized regression and variable retention Use `glmnet` when you care about regularized modeling and whether variables are selected consistently. ```{r pattern6, eval = requireNamespace("glmnet", quietly = TRUE)} if (requireNamespace("glmnet", quietly = TRUE)) { diag_lasso <- run_diagnostics( mpg ~ wt + hp + disp + qsec, data = mtcars, B = 150, backend = "glmnet", en_alpha = 1 ) reproducibility_index(diag_lasso) selection_stability(diag_lasso) } ``` Here, selection stability is especially informative because it reflects non-zero retention frequency. ## Pattern 7: Compare candidate models by ranking stability When you have multiple plausible formulas, repeated CV ranking stability helps you see whether one model wins consistently or only occasionally. ```{r pattern7} models <- list( compact = mpg ~ wt + hp, standard = mpg ~ wt + hp + disp, expanded = mpg ~ wt + hp + disp + qsec ) cv_obj <- cv_ranking_stability(models, mtcars, v = 5, R = 40) cv_obj$summary ``` Focus on two columns: - `mean_rank`: lower is better on average - `top1_frequency`: higher means the model is more often the winner Those two quantities are related but not identical. ## Pattern 8: Reporting a compact reproducibility section A practical reporting workflow is: 1. fit a diagnostic object with your primary model 2. compute the RI and a confidence interval 3. report the most unstable component 4. include one or two plots 5. if model selection is part of the analysis, add CV ranking stability Example: ```{r pattern8} diag_obj <- run_diagnostics( mpg ~ wt + hp + disp, data = mtcars, B = 150, method = "bootstrap" ) ri <- reproducibility_index(diag_obj) ci <- ri_confidence_interval(diag_obj, R = 300, seed = 1) ri ci ``` ## A practical decision checklist Before running a large analysis, decide: 1. which perturbation method reflects your concern 2. how many iterations `B` are feasible 3. whether prediction stability should be in-sample or on `predict_newdata` 4. whether you want a standard, robust, or penalized backend 5. whether model comparison is part of the task ## Next steps Use these patterns as templates, then adapt them to your own data, formulas, and modeling constraints. For conceptual interpretation, read the interpretation article. For function-level details, use the reference pages.