---
title: "Workflow Patterns"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Workflow Patterns}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(ReproStat)
set.seed(20260324)
```

## Why workflow patterns help

Most users do not need every feature in ReproStat at once. They usually need a
small number of reliable patterns that fit real analysis situations.

This article shows several practical ways to use the package.

## Pattern 1: Standard regression stability check

Use this when you already have a preferred regression specification and want to
know how stable its outputs are.

```{r pattern1}
diag_obj <- run_diagnostics(
  mpg ~ wt + hp + disp,
  data = mtcars,
  B = 200,
  method = "bootstrap"
)

reproducibility_index(diag_obj)
selection_stability(diag_obj)
```

Recommended outputs to review:

- `reproducibility_index()`
- `selection_stability()`
- `plot_stability(diag_obj, "selection")`

## Pattern 2: Sample-composition sensitivity

Use this when you are worried that the model depends too strongly on exactly
which observations appear in the sample.

```{r pattern2}
diag_sub <- run_diagnostics(
  mpg ~ wt + hp + disp,
  data = mtcars,
  B = 200,
  method = "subsample",
  frac = 0.75
)

reproducibility_index(diag_sub)
```

This is often useful for:

- smaller datasets
- observational studies
- analyses where influence of sample composition is a concern

## Pattern 3: Measurement-noise stress test

Use noise perturbation when predictors may be measured with minor error and you
want to know whether the fitted result is sensitive to that noise.

```{r pattern3}
diag_noise <- run_diagnostics(
  mpg ~ wt + hp + disp,
  data = mtcars,
  B = 150,
  method = "noise",
  noise_sd = 0.05
)

reproducibility_index(diag_noise)
prediction_stability(diag_noise)$mean_variance
```

This does not replace a full measurement-error model, but it gives a useful
practical stress test.

## Pattern 4: Logistic model reproducibility

Use a GLM backend when the response is binary.

```{r pattern4}
diag_glm <- run_diagnostics(
  am ~ wt + hp + qsec,
  data = mtcars,
  B = 150,
  backend = "glm",
  family = stats::binomial()
)

reproducibility_index(diag_glm)
```

This pattern is helpful when you want to know whether the classification-style
conclusion is stable under perturbation, not just whether the original fit
looks significant.

## Pattern 5: Robust regression with outlier concern

If you suspect outliers or heavy tails are affecting the OLS result, compare a
robust backend.

```{r pattern5, eval = requireNamespace("MASS", quietly = TRUE)}
if (requireNamespace("MASS", quietly = TRUE)) {
  diag_rlm <- run_diagnostics(
    mpg ~ wt + hp + disp,
    data = mtcars,
    B = 150,
    backend = "rlm"
  )

  reproducibility_index(diag_rlm)
}
```

This is often a useful companion analysis rather than a full replacement.

## Pattern 6: Penalized regression and variable retention

Use `glmnet` when you care about regularized modeling and whether variables are
selected consistently.

```{r pattern6, eval = requireNamespace("glmnet", quietly = TRUE)}
if (requireNamespace("glmnet", quietly = TRUE)) {
  diag_lasso <- run_diagnostics(
    mpg ~ wt + hp + disp + qsec,
    data = mtcars,
    B = 150,
    backend = "glmnet",
    en_alpha = 1
  )

  reproducibility_index(diag_lasso)
  selection_stability(diag_lasso)
}
```

Here, selection stability is especially informative because it reflects
non-zero retention frequency.

## Pattern 7: Compare candidate models by ranking stability

When you have multiple plausible formulas, repeated CV ranking stability helps
you see whether one model wins consistently or only occasionally.

```{r pattern7}
models <- list(
  compact  = mpg ~ wt + hp,
  standard = mpg ~ wt + hp + disp,
  expanded = mpg ~ wt + hp + disp + qsec
)

cv_obj <- cv_ranking_stability(models, mtcars, v = 5, R = 40)
cv_obj$summary
```

Focus on two columns:

- `mean_rank`: lower is better on average
- `top1_frequency`: higher means the model is more often the winner

Those two quantities are related but not identical.

## Pattern 8: Reporting a compact reproducibility section

A practical reporting workflow is:

1. fit a diagnostic object with your primary model
2. compute the RI and a confidence interval
3. report the most unstable component
4. include one or two plots
5. if model selection is part of the analysis, add CV ranking stability

Example:

```{r pattern8}
diag_obj <- run_diagnostics(
  mpg ~ wt + hp + disp,
  data = mtcars,
  B = 150,
  method = "bootstrap"
)

ri <- reproducibility_index(diag_obj)
ci <- ri_confidence_interval(diag_obj, R = 300, seed = 1)

ri
ci
```

## A practical decision checklist

Before running a large analysis, decide:

1. which perturbation method reflects your concern
2. how many iterations `B` are feasible
3. whether prediction stability should be in-sample or on `predict_newdata`
4. whether you want a standard, robust, or penalized backend
5. whether model comparison is part of the task

## Next steps

Use these patterns as templates, then adapt them to your own data, formulas,
and modeling constraints. For conceptual interpretation, read the
interpretation article. For function-level details, use the reference pages.