--- title: "Asymmetric Smoothed-Association Matrices" output: rmarkdown::html_vignette: toc: true fig_width: 7 fig_height: 7 vignette: > %\VignetteIndexEntry{Asymmetric Smoothed-Association Matrices} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.alt = paste( "Asymmetric smoothed-association matrix produced by", "janusplot(); diagonal cells hold variable labels,", "off-diagonal cells show the fitted mgcv::gam spline", "with a 95% confidence envelope, raw-data scatter,", "and per-cell annotations for n, EDF, and smooth", "significance glyph." ), fig.width = 7, fig.height = 7, dpi = 110, eval = TRUE ) library(ggplot2) set.seed(2026L) ``` ## Why Pearson is not enough A Pearson correlation matrix gives one scalar per pair of variables. Two numbers are discarded in that collapse: 1. The *shape* of the association — linear, monotone non-linear, U-shaped, or irregular. 2. The *direction* — whether $y$ is a smooth function of $x$ differs in general from whether $x$ is a smooth function of $y$, because leverage and noise are directional. `janusplot()` renders both recoveries visually for every pair in a matrix layout, using proper `mgcv::gam()` fits (not loess) so EDF, F-tests, and random effects are available. ## Quick start ```{r quickstart} library(janusplot) # Four numeric columns from mtcars (32 rows: small but illustrative) janusplot(mtcars[, c("mpg", "hp", "wt", "qsec")]) ``` Each off-diagonal cell shows: - raw scatter (light grey), - the fitted spline (blue line) and 95% CI ribbon, - EDF (effective degrees of freedom) in the bottom-right, - *n* used in the bottom-left, - a signif-glyph in the top-right (`***` / `**` / `*` / `·`). The cell fill is keyed to EDF: darker = more non-linear. ## Non-linear detection A synthetic quadratic + sinusoidal example. The matrix makes it obvious which variables are genuinely non-linearly related to which. ```{r nonlinear} n <- 300 x1 <- runif(n, -3, 3) x2 <- x1^2 + rnorm(n, sd = 0.6) # quadratic on x1 x3 <- sin(x1) + rnorm(n, sd = 0.4) # sinusoidal on x1 x4 <- rnorm(n) # independent d <- data.frame(x1 = x1, x2 = x2, x3 = x3, x4 = x4) janusplot(d) ``` EDF for `x2 ~ s(x1)` and `x3 ~ s(x1)` should clearly exceed 1; the cell fills reflect that. Cells involving `x4` should be close to EDF = 1 (linear / flat). ## Asymmetry — a heteroscedastic example When the noise scale depends on a predictor, the two directional smooths diverge: $y \sim s(x)$ recovers the mean relationship; $x \sim s(y)$ is distorted by the variance asymmetry. ```{r asym} n <- 400 x <- runif(n, 0, 5) y <- 0.5 * x + rnorm(n, sd = 0.3 + 0.4 * x) # variance grows with x d <- data.frame(x = x, y = y, z = rnorm(n)) janusplot(d) ``` The `A = ...` label per cell reports the asymmetry index $A_{ij} = |EDF_{y|x} - EDF_{x|y}| / (EDF_{y|x} + EDF_{x|y}) \in [0, 1]$, shown by default in the bottom-left corner alongside `EDF = ...`. ## Partial smooths (controlling for covariates) Pass `adjust =` as a one-sided formula RHS to include fixed covariates and/or random effects in every pairwise GAM. ```{r adjust, eval = requireNamespace("palmerpenguins", quietly = TRUE)} library(palmerpenguins) pp <- na.omit(penguins) # Without covariate janusplot(pp[, c("bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g")]) # With species as a fixed effect — resolves Simpson's-paradox geometry janusplot(pp, vars = c("bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"), adjust = ~ species) ``` ## Changing the palette The cell fill encodes the EDF (or deviance-explained) of the smooth and is accompanied by a shared colourbar legend. Choose a palette with `palette = `. ```{r palette-viridis, fig.height = 5, fig.width = 6} d <- data.frame( x1 = runif(200, -3, 3), x2 = rnorm(200), x3 = rnorm(200) ) d$x2 <- d$x1^2 + rnorm(200, sd = 0.8) # non-linear janusplot(d, palette = "viridis") # default, colourblind-safe ``` ```{r palette-brewer, fig.height = 5, fig.width = 6} janusplot(d, palette = "RdYlBu") # diverging, colourblind-safe ``` ```{r palette-turbo, fig.height = 5, fig.width = 6} janusplot(d, palette = "turbo") # high-contrast, NOT colourblind-safe ``` **Colourblind-safe choices:** - *Sequential:* `viridis` (default), `magma`, `inferno`, `plasma`, `cividis`, `mako`, `rocket`, `YlOrRd`, `YlGnBu`, `Blues`, `Greens`. - *Diverging:* `RdYlBu`, `RdBu`, `PuOr`. **High-contrast but not colourblind-safe:** `turbo`, `Spectral`. ## Handling missing data ```{r missing} # airquality has genuine NAs in Ozone and Solar.R janusplot(airquality[, c("Ozone", "Solar.R", "Wind", "Temp")], na_action = "pairwise") ``` `na_action = "pairwise"` uses all rows for which *that* pair is complete; `"complete"` restricts to rows complete across every variable (matching listwise deletion). ## Scaling up — `order = "hclust"` For k large, reorder the axes by hierarchical clustering on |correlation|: ```{r hclust, eval = requireNamespace("MASS", quietly = TRUE)} data(Boston, package = "MASS") janusplot(Boston[, c("medv", "lstat", "rm", "age", "indus", "nox", "dis")], order = "hclust") ``` ## Programmatic access — `janusplot_data()` Returns raw GAM fits and per-cell metrics without constructing a ggplot — useful for custom rendering or downstream analysis. ```{r data} # Re-create the heteroscedastic example n <- 400 het <- data.frame( x = runif(n, 0, 5), y = NA_real_ ) het$y <- 0.5 * het$x + rnorm(n, sd = 0.3 + 0.4 * het$x) out <- janusplot_data(het, vars = c("x", "y")) out$pairs[[1L]]$edf_yx out$pairs[[1L]]$edf_xy out$pairs[[1L]]$asymmetry_index ``` ## Shape metrics explained Every fitted smooth is summarised by two continuous indices and two discrete counts. These drive the 24-category classifier and appear as columns in `janusplot(..., with_data = TRUE)$data` and as fields on each entry of `janusplot_data()$pairs`. Let `f(x)` be the fitted smooth on a dense grid of 200 equally-spaced points across the predictor range, with `f'` and `f''` the numerical first and second derivatives. Let `w(x)` be the empirical density of the predictor on the same grid, normalised to `sum(w) = 1`. - **`monotonicity_index`** (paper symbol `M`): `M = sum(w * f') / sum(w * |f'|) in [-1, 1]` `+1` means strictly increasing, `-1` strictly decreasing, `0` a non-monotone curve (bowl, dome, wave). - **`convexity_index`** (paper symbol `C`): `C = sum(w * f'') / sum(w * |f''|) in [-1, 1]` `+1` means globally convex (bowl-up), `-1` globally concave (bowl-down), `0` inflection-dominated (S-curve, sine, flat). Both indices are density-weighted so they describe the smooth *where the data actually live*, not extrapolated tails, and are scale-invariant: replacing `y` with `a * y + b` leaves them unchanged. - **`n_turning_points`** — count of interior extrema (sign changes of `f'`), robust to noise via lobe-mass weighting. - **`n_inflections`** — count of interior curvature flips (sign changes of `f''`), same robust counting. Together the pair `(n_turning_points, n_inflections)` drives the primary `shape_category` dispatch; `(monotonicity_index, convexity_index)` disambiguate within the monotone `(0, 0)` and single-extremum `(1, 0)` cells. The full taxonomy with 2-letter codes, archetypes, and thumbnail curves is available from `janusplot_shape_hierarchy()` and is rendered as the standing legend below every `janusplot()` call. Tune the thresholds applied to these indices via `janusplot_shape_cutoffs()`. See the [shape-recognition-sensitivity](shape-recognition-sensitivity.html) vignette for how faithfully the classifier recovers ground-truth shapes across sample-size and noise regimes. ## Derivative views: theoretical justification and applied use Each matrix renders one quantity. `display = "fit"` (default) shows the fitted smooth; `display = "d1"` shows \eqn{\hat f'(x)}; `display = "d2"` shows \eqn{\hat f''(x)}. A top-of-matrix title names the mode, so side-by-side calls compare unambiguously. Orders beyond two are not exposed — see *Noise amplification* below. Derivative CI rendering is **off by default**; opt in with `derivative_ci = "pointwise"` or `"simultaneous"`. ```{r derivs-fit, fig.height = 4.5} set.seed(2026L) n <- 300L xs <- runif(n, -pi, pi) df <- data.frame( x = xs, y1 = xs + sin(3 * xs) + rnorm(n, sd = 0.15), y2 = 0.5 * xs^2 + rnorm(n, sd = 0.8) ) janusplot(df, display = "fit", show_shape_legend = FALSE) ``` ```{r derivs-d1, fig.height = 4.5} janusplot(df, display = "d1", show_shape_legend = FALSE) ``` ```{r derivs-d2, fig.height = 4.5} janusplot(df, display = "d2", show_shape_legend = FALSE) ``` Turn on simultaneous bands — a single call gets the Monte Carlo critical multiplier per Simpson (2018): ```{r derivs-sim, fig.height = 4.5} janusplot(df, display = "d1", derivative_ci = "simultaneous", derivative_ci_nsim = 2000L, show_shape_legend = FALSE) ``` ### What derivatives reveal that the fit hides The fitted smooth $\hat f(x) = \mathbb{E}[y\mid x]$ is a level description. Its derivatives are different statistical objects with their own interpretations: - $\hat f'(x)$ — the **local rate of change** of $y$ in $x$. Zero crossings localise the turning points of $\hat f$; the sign of $\hat f'$ gives the direction of monotonicity; the magnitude gives the sensitivity at the operating point $x$. In control engineering this is literally the process gain $K(x) = \partial y / \partial u$ that gain-scheduled controllers are built around (Rugh & Shamma, 2000; Leith & Leithead, 2000). In causal analysis of a continuous treatment it is the derivative of the dose--response curve $\mu'(t) = \partial \mathbb{E}[Y(t)] / \partial t$, which Zhang & Chen (2025) argue is often *the* treatment-effect object of interest, not the curve itself. - $\hat f''(x)$ — the **local curvature**. Zero crossings localise the inflection points of $\hat f$; a persistently positive second derivative flags accelerating growth, persistently negative flags saturation (diminishing returns). $\hat f''$ is the input to the convexity index $C$ defined earlier in this vignette, so the derivative panel exposes the *local* signal behind that scalar summary. The asymmetric matrix layout sharpens this. `janusplot()` fits both $\hat f_{y\mid x}(x)$ and $\hat f_{x\mid y}(y)$, so derivative panels on the two triangles answer genuinely different questions: the upper triangle is "how steeply does $y$ respond to a nudge in $x$ at this operating point" (forward gain); the lower triangle is "how steeply must $x$ change to induce a unit change in $y$" (inverse sensitivity). For an asymmetric process these do not transpose into each other, and the directional asymmetry is a diagnostic the symmetric correlation matrix cannot expose (Janzing & Schölkopf, 2010). ### Estimation — the LP matrix Let $X_p = X_p(\mathbf{x}_g)$ denote the design (linear predictor) matrix of the fitted GAM evaluated on the plotting grid $\mathbf{x}_g$, obtained from `predict(gam_fit, newdata = ..., type = "lpmatrix")` (Wood, 2017, §7.2.4). With penalised posterior mean $\hat{\boldsymbol\beta} = \mathtt{coef(gam\_fit)}$ and posterior covariance $V_p = \mathtt{gam\_fit\$Vp}$, we construct a finite-difference operator $D^{(k)}$ on the *rows* of $X_p$ (central differences in the interior, second-order forward / backward stencils at the endpoints) and read off $$ \hat f^{(k)}(x_i) = \bigl[D^{(k)} \hat{\boldsymbol\beta}\bigr]_i, \qquad \widehat{\mathrm{Var}}\bigl(\hat f^{(k)}(x_i)\bigr) = \bigl[D^{(k)} V_p \bigl(D^{(k)}\bigr)^{\!\top}\bigr]_{ii}. $$ Pointwise $95\%$ intervals are $\hat f^{(k)}(x) \pm 1.96\,\sqrt{\cdot}$. This is the standard Wood (2017) construction, and is what `gratia::derivatives()` implements in its default mode (Simpson, 2014; Simpson, 2018). Columns of $X_p$ corresponding to `adjust` terms held at typical values contribute identical rows across the grid, so their finite differences are zero and they drop out of both $\hat f^{(k)}$ and its variance — the derivative in the panel is therefore the derivative of the *partial smooth actually shown in the fit panel*, as expected. For simultaneous intervals over the full grid (a stricter question than pointwise, and what you want for formal feature localisation), `janusplot()` implements the Monte Carlo construction of Ruppert, Wand & Carroll (2003, §6.5), popularised for GAMs by Simpson (2018): draw $\tilde{\boldsymbol\beta}_b \sim \mathcal{N}(\hat{\boldsymbol\beta}, V_p)$ for $b = 1, \ldots, B$ and take the $(1-\alpha)$ quantile of $\max_i |D^{(k)}_i (\tilde{\boldsymbol\beta}_b - \hat{\boldsymbol\beta})| / \mathtt{se}_i$ across the plotting grid as a critical multiplier $c_\alpha$ on the pointwise SE, so the simultaneous band is $\hat f^{(k)}(x) \pm c_\alpha\,\mathtt{se}(x)$. Opt in via `derivative_ci = "simultaneous"` on either `janusplot()` or `janusplot_data()`; the default is `derivative_ci = "none"` so that no CI is drawn by default — derivative ribbons invite over-reading of local features and should be a deliberate choice, not a default. The implementation uses $B = 1000$ (see `derivative_ci_nsim`); Simpson (2018) uses $10\,000$, which is affordable if you need tighter quantile estimation. ### Noise amplification and why we cap at $k = 2$ Finite differencing of raw data amplifies noise; penalised splines do not eliminate that amplification, they trade it against bias via the REML-selected smoothing parameter. `mgcv`'s default thin-plate penalty is on $\int (f'')^2$, which directly regularises $\hat f'$ and bounds (but does not penalise) $\hat f''$ only via the basis rank (Wood, 2017, §5.3; Eilers & Marx, 1996). In practice we find $\hat f^{(3)}$ is dominated by noise for $n < 10^4$ at moderate $k$, and so janusplot refuses $k \ge 3$ by design. If you have a domain-specific reason to need a higher-order derivative, specify a matching-order P-spline penalty explicitly (Eilers, Marx & Durbán, 2015) and extract it yourself from `janusplot_data()`. ### Applied use: gain estimation and dose--response Two strands in which the asymmetric derivative view is not a cosmetic add-on but the analytical primitive the practitioner actually wants. - **Process-gain scheduling.** In adaptive and gain-scheduled control, the controller is indexed by the local process gain $K(x) = \partial y / \partial u$ (Rugh & Shamma, 2000). For a steady-state input-output dataset, $\hat f_{y\mid u}'(u)$ is a direct data-driven estimate of $K(u)$, and its simultaneous CI tells the engineer whether the local gain is distinguishable from a reference gain over an operating envelope. The inverse panel $\hat f_{u\mid y}'(y)$ is the feedforward-linearisation sensitivity; a large divergence between the two panels flags that a naive inverse controller will under-perform (Korda & Mezić, 2018). The matrix view makes a fleet of such pairs inspectable at once. - **Derivative of the dose--response curve as the causal estimand.** For a continuous treatment $T$ with unconfoundedness, the dose--response $\mu(t) = \mathbb{E}[Y(t)]$ and its derivative $\mu'(t)$ are both estimable, and recent work (Zhang & Chen, 2025) argues $\mu'(t)$ is often the more directly interpretable quantity — it answers "how much does the expected outcome change per unit shift in treatment at this dose?" This is structurally the same estimand as the process gain above; the asymmetric-matrix derivative panel delivers both forward and reverse-conditioned derivative curves in the same frame, which is a direct diagnostic for Simpson's-paradox-style conditioning reversals (visible, for example, in the penguins `bill_depth_mm` $\times$ `body_mass_g` pair once `species` is adjusted for). ### References cited in this section Eilers, P. H. C., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. *Statistical Science*, **11**(2), 89--121. Eilers, P. H. C., Marx, B. D., & Durbán, M. (2015). Twenty years of P-splines. *SORT*, **39**(2), 149--186. Janzing, D., & Schölkopf, B. (2010). Causal inference using the algorithmic Markov condition. *IEEE Transactions on Information Theory*, **56**(10), 5168--5194. Korda, M., & Mezić, I. (2018). Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. *Automatica*, **93**, 149--160. Leith, D. J., & Leithead, W. E. (2000). Survey of gain-scheduling analysis and design. *International Journal of Control*, **73**(11), 1001--1025. Rugh, W. J., & Shamma, J. S. (2000). Research on gain scheduling. *Automatica*, **36**(10), 1401--1425. Ruppert, D., Wand, M. P., & Carroll, R. J. (2003). *Semiparametric Regression*. Cambridge University Press. Simpson, G. L. (2014). Simultaneous confidence intervals for derivatives of smooth terms in a GAM. *From the Bottom of the Heap* (blog post). Simpson, G. L. (2018). Modelling palaeoecological time series using generalised additive models. *Frontiers in Ecology and Evolution*, **6**, 149. Wood, S. N. (2017). *Generalized Additive Models: An Introduction with R* (2nd ed.). Chapman and Hall/CRC. Zhang, Y., & Chen, Y.-C. (2025). Doubly robust inference on causal derivative effects for continuous treatments. arXiv preprint [2501.06969]. ## Limitations - Pairwise view, not conditional — always complement with a proper multivariate model. - EDF depends on basis dimension `k`; defaults are sensible but domain-specific tuning is encouraged. - The asymmetry index should not be interpreted causally without strong assumptions. - `monotonicity_index` and `convexity_index` are scale-invariant in `y` but sensitive to the predictor-density weighting — they describe the smooth on the observed support of `x`, not outside it. - `display` is scalar — a single `janusplot()` call renders a single quantity (fit, d1, or d2). To compare fit against derivative, issue two or three calls; each carries its own matrix-level title and, when `with_data = TRUE`, its own `display`-tagged summary table. - Derivative panels show **no confidence ribbon by default** (`derivative_ci = "none"`). Opt in explicitly: `"pointwise"` for marginally-valid pointwise 95% bands, `"simultaneous"` for Simpson (2018) Monte Carlo bands valid for feature localisation. - Requesting `display %in% c("d1", "d2")` raises the default prediction-grid resolution from 100 to 200 points, which slightly shifts the numeric shape-metric values (`M`, `C`, turning and inflection counts) reported alongside the fit. Shapes and asymmetry — the primary reading of the matrix — are robust to this drift; `M`, `C` and the counts are secondary diagnostics. The precomputed `shape_sensitivity_demo` dataset was generated under `n_grid = 100` and is preserved as-is for reproducibility. ## Citation ```{r cite} citation("janusplot") ```