muse v0.1.0 ======= * Joint maximum-likelihood Box-Cox lambda estimation: corrected the lambda gradient. The optimiser's lambda derivative was a forward difference using the optimiser's cached objective value, but that value is not a reliable f(p) at the gradient point -- the backward smoother and the structural- gradient loop that run first mutate persistent Kalman-filter state that the likelihood reads, so a fresh evaluation at the same p differs by ~1. The stale baseline produced a gradient of ~1e6 that drove lambda hard to a bound and froze it at the warm-start (e.g. a monthly series settled at PTS(-2,D,N), AICc 525, instead of the near-optimal PTS(2,G,D), AICc 282). The lambda gradient is now taken as a central difference at the TOP of the gradient routine, from the clean state the objective leaves, before anything is perturbed; and the warm-start seed is kept out of the logistic's saturated tails. Shared C++ engine, so identical in R and Python. * Joint maximum-likelihood Box-Cox lambda estimation (`lambda_estim = "likelihood"`) now finds interior optima reliably. The engine optimised lambda against a hard clamp to its `[lower, upper]` bounds; the clamp made the likelihood flat beyond a bound, so the optimiser's gradient there was exactly zero and it got trapped at the bound -- pinning lambda at the floor / cap and missing the true optimum (e.g. a 74%-zero series settled at the zero floor 0.078, where the prediction interval exploded, instead of the likelihood optimum ~0.18). The lambda slot is now an unconstrained parameter mapped to `(lower, upper)` through a logistic, so the gradient stays informative everywhere; the optimiser locates interior optima and still goes to a bound only when the profile genuinely wants it. Shared C++ engine, so identical in R and Python. * New `lambda_estim` argument to `pts()` selects how the Box-Cox power is estimated when the power slot is `"Z"`: `"likelihood"` (new default -- estimate lambda jointly with the structural parameters by maximum likelihood in the engine), `"guerrero"` (the classical Guerrero (1993) variance- stabilisation screen on raw season-length blocks), or `"decomp-guerrero"` (Guerrero on an `msdecompose`-smoothed trend, the former default). The Guerrero screens minimise a coefficient of variation and ignore the likelihood, so they can pick a poor lambda -- on a zero-heavy series they may steer into the small-lambda region where the back-transform explodes. The likelihood estimator avoids this; it is now the default. * Zero-containing series no longer break joint lambda estimation. Box-Cox maps `y = 0` to `g(0) = -1/lambda`, which at `lambda = 0` is `-Inf` and silently drops the observation -- so the likelihood (and AICc) at small lambda was computed on fewer points and was not comparable across lambda, luring a likelihood search toward a degenerate small lambda. The joint-lambda lower bound is now the proper zero floor `log(2)/log(max(y))` (was a flat `1e-10`), keeping the search in the comparable, finite-sample region. On an intermittent tourism series this changed the auto lambda from ~0.09 (forecast blew up to ~2e5 against actuals ~7e3) to ~0.25 (forecast ~8e3). * New `biasadj` argument to `pts()` / `forecast()` (default `FALSE`). Point forecasts are now the back-transformed conditional MEDIAN by default; with `biasadj = TRUE` a second-order bias correction yields the conditional MEAN (as in `forecast::InvBoxCox(biasadj = TRUE)`). The mean is not robust under a small-lambda Box-Cox (its heavy right tail makes it explode), so the median is the safer default. Prediction-interval quantiles are unaffected. * The default information criterion for automatic model selection is now `AICc` (was `BICc`), matching `smooth::adam`. * Information criteria now charge for the estimated structural initial states, and `$nParam` is now an adam-style breakdown table. The initial level, slope (for local / global / damped trend), cycle, and seasonal states are diffuse / profiled out and so are genuine degrees of freedom; they were previously free in the IC, which let BICc / AICc under-penalise flexible seasonal and trend shapes. The estimated-initials count `ns(0) + ns(1) + ns(2)` is taken straight from the engine, so it is driven by `lags` (correct for multi-seasonal `lags = c(m1, m2, ...)`) with no hard-coded period sizes; the stationary ARMA states are excluded (their initial is the stationary distribution, not a free parameter; the ARMA *coefficients* are still counted). The engine adds the same quantity to its own model-selection criterion, so `pts(model = "ZZZ")` selection and the reported `nParam` agree. `$nParam` is now a 2 x 5 matrix mirroring `smooth::adam` -- rows `Estimated` / `Provided`, columns `nParamInternal`, `nParamXreg`, `nParamOccurrence`, `nParamScale`, `nParamAll`. `nparam()` returns the `[Estimated, nParamAll]` cell (the total DoF, unchanged). The estimated initials fold into `nParamInternal` exactly as adam does (there is no separate `nInitial` slot); the concentrated variance is `nParamScale = 1`; regressors are `nParamXreg`. Note `coef()` / `vcov()` / `confint()` still cover only the estimated coefficients, so `length(coef)` is smaller than `nparam()`. Shared C++ engine, so identical in R and Python. * Point forecasts (and the in-sample point forecast) are now the conditional MEAN under a Box-Cox transform, not the median. Back-transforming the Box-Cox-scale point forecast g^{-1}(mu) returns the conditional MEDIAN on the original scale; for lambda < 1 the inverse transform is convex so the MEDIAN is below the MEAN, and the reported `$mean` / forecast(...)$mean systematically under-forecast (e.g. ~8% low on a lambda = 0.43 series). A second-order bias correction is now applied, mean ~= g^{-1}(mu) * (1 + 0.5 * var * (1 - lambda) / (1 + lambda*mu)^2), with `var` the Box-Cox-scale forecast variance (matching forecast::InvBoxCox(biasadj = TRUE)). Prediction-interval quantiles are unchanged -- they are exact quantiles of the back-transformed distribution and must not be bias-corrected. At lambda = 1 there is no change. Shared logic, so R and Python agree. * Series containing zeros can now be variance-stabilised (and forecast non-negatively). Previously any zero forced lambda = 1 (raw scale): the msdecompose + Guerrero lambda screen bailed to 1 on `any(y <= 0)`, and even a user-supplied lambda != 1 returned a NaN log-likelihood, so zero-heavy / intermittent series were fit on the raw scale and routinely forecast negative values. Two fixes: - The BCnorm density now rejects only the genuinely-undefined corner (y <= 0 with lambda <= 0); y == 0 with lambda > 0 is finite (g(0) = -1/lambda, e.g. sqrt(0) = 0) and is allowed. - The lambda screen now runs on zero-containing series (it disqualifies only on NEGATIVE values), with a floor lambda >= log(2)/log(max(y)) so the transformed zero is no more extreme than the transformed maximum. A power transform in (0, 1) both fits intermittent series far better and guarantees non-negative forecasts (the inverse BoxCox can't go below zero). E.g. eurostat tourism series 267 moves from PTS(1,G,D) with a forecast dipping to -101, to PTS(0.13,N,D) forecasting in [0, 15]. Shared C++ engine for the density; the R and Python lambda screens are kept in parity. * AR / SAR coefficients are now reported in the standard Box-Jenkins convention (1 - phi B) y_t = (1 + theta B) e_t, matching stats::arima and the forecasting literature. The engine stores them internally in the (1 + phi B) form (which makes the seasonal convolution and the state-space companion work out), so the reported coefficient was the NEGATIVE of the standard one (e.g. a +0.7 AR(1) signal was reported as -0.7). The reported AR/SAR coefficients (and their covariance) are now flipped to the standard sign at the output boundary, and flipped back when a fitted object's coefficients are reused for forecasting, so the round-trip is exact and forecasts are unchanged. MA / SMA coefficients were already standard and are unaffected. Shared C++ engine, so identical in R and Python. * Strongly seasonal series no longer get a spurious large POSITIVE log-likelihood (and the resulting explosive forecast / wrong model selection). The concentrated likelihood was scaling the diffuse- initialisation determinant by the concentrated variance sigma^2. That is harmless near sigma^2 ~ 1, but when the optimiser reaches a degenerate corner -- a local-linear-trend whose observation variance collapses, so sigma^2 underflows to ~1e-117 -- the d_t diffuse terms injected ~ +1600 of bogus likelihood (e.g. logLik +986 on a series the model actually fits with sigma ~ 0.6, picking an exp-trend forecast that blows up to ~3.6e7). The diffuse determinant is now left UNSCALED by sigma^2 -- the exact-diffuse likelihood convention (Durbin-Koopman): only the n - d_t non-diffuse observations carry the variance scale. Such series now report a sane negative logLik (~ -737) and select a sensible seasonal model. logLik / information criteria of models with diffuse initialisation shift by a small amount (the diffuse determinant is now scale-free). Shared C++ engine, so identical to Python. * Concentrated variance can no longer come out negative (NaN standard errors / negative displayed variances). In the augmented Kalman likelihood (`llikAug`) the concentrated innovation variance is the residual sum of squares (after projecting out the diffuse initial states and regressors) divided by `n`. It used to be formed as the projection identity `RSS = v2F - sn'Sn^{-1}sn`, a subtraction of two large nearly equal numbers at a near-interpolating fit -- the classic unstable way to compute a sum of squares -- which lost all precision (catastrophic cancellation) and could flip NEGATIVE, making `innVariance`, the displayed variances (`ratio * innVariance`) and the parameter covariance non-positive. The RSS is now formed directly as the sum of squared GLS residuals `Sum (v_t - V_t·beta_hat)^2 / F_t`, a sum of non-negative terms that is structurally `>= 0` in floating point, so the negative-variance artifact cannot occur however degenerate the fit. (A genuinely interpolating fit still drives RSS toward 0; that residual degeneracy is a modelling matter for the disturbance-variance control, not a numerical defect.) * Forecasts no longer blow up when a disturbance variance collapses to zero. With the corrected optimiser, a flexible model on some series drives a variance log-parameter very negative, so `exp(2*p)` underflows to EXACTLY 0 in the system matrices. A zero disturbance variance makes the Kalman innovation variance `F_t = Z P Z' + H` zero, the gain `K = P Z'/F_t` becomes `0/0 = NaN`, the terminal state is `NaN`, and the forecast explodes (e.g. ~1e12 on a series whose data maxes near 1e6). The variance log-parameters (`typePar == 0`) are now floored at `var >= exp(-23) ~ 1e-10` inside `bsmMatrices` / `bsmMatricesTrue` -- numerically "zero" for the model but keeping the filter well-defined. Structural zeros (companion states) are untouched; well-identified models (variances far above the floor) are unaffected. * Outlier detection (`outliers = "use"`) fixed. Two bugs are addressed: - The disturbance smoother standardised the auxiliary residuals with a single matrix `pinv()` across all components; its tolerance treated any component whose variance was far below the largest (e.g. a near-zero observation variance) as singular and ZEROED it -- destroying the outlier signal there, so spikes went undetected. The smoother now emits the raw smoothed disturbance and standardises each component by its own empirical SD, which is scale-robust. - The forward/backward outlier search dropped a detected outlier whenever the no-outlier baseline had a "better" information criterion. When the baseline overfits a short series (the unbounded variance->0 likelihood singularity), its IC is artificially good and no genuine outlier model can beat it, so real, highly-significant outliers were silently removed. Acceptance now relies on the per-outlier significance test (the backward-deletion step already keeps only outliers whose augmented-KF coefficient t-stat clears the `level` z-threshold); the model with outliers is kept unless it fails to converge. * Estimation gradient corrected; models converge to better optima. The analytic log-likelihood gradient (`gradLlik`) used by the quasi-Newton optimiser was wrong for structural models: it left the baseline `Q`/`H` on the stale ratio scale from the previous objective call while building the perturbed `Q`/`H` on the absolute scale, so the finite-difference `dQ` mixed units and exploded by ~1/innVariance whenever the concentrated variance was small (e.g. log / near-deterministic data). It also normalised by `n - nMiss - d_t - 1` instead of the `nFinite` the objective averages over (a few-percent error on seasonal models). Both are fixed; the analytic gradient now matches a central-difference reference to ~6 digits across structural specs. Separately, the optimiser gained a **descent-direction safeguard**: when the BFGS inverse-Hessian loses positive-definiteness and yields a non-descent direction, it resets to steepest descent instead of stalling at a non-stationary point. Together these make estimation converge to materially higher likelihoods (e.g. on AirPassengers: PTS(0,N,T) +129, PTS(1,N,T) +94, PTS(1,L,T) +15 log-lik; every structural spec improved, none regressed). Models that previously appeared "degenerate" (the optimiser quitting after ~2 iterations with `Q-Newton: Unable to decrease` and near-zero variances) now fit properly. ARMA / damped / regressor / cycle models already used the numerical gradient and are unaffected by the analytic fix; they also benefit from the descent safeguard. * Parameter covariance (`vcov`, and hence `confint` / printed standard errors) is now reliable for weakly-identified and ill-conditioned models. Two changes: - the observed-information Hessian is computed with central second differences and a per-parameter, magnitude-scaled step (was one-sided forward differences with a fixed 1e-5 step for every parameter). This removes a first-order truncation bias and the catastrophic cancellation that made the Hessian noisy and irreproducible in flat directions. - when the observed Hessian is indefinite, numerically singular, or non-finite at the optimum -- typically a boundary / weakly-identified variance (e.g. a near-zero local-linear-trend variance) or an ARMA phi~theta near-cancellation -- the covariance falls back to the OPG / BHHH estimator (sum_t s_t s_t', the per-observation outer product of scores), which is positive semidefinite by construction. (For a fully degenerate fit, where the disturbance variances collapse to ~0 and the Kalman innovation variances are non-positive, neither estimator is defined and `vcov` stays NaN.) Previously such models could return a non-PSD "covariance" (implied correlations outside +/-1) and standard errors that swung by tens of percent under negligible floating-point reordering. Point estimates, logLik, and information criteria are unchanged. The OPG fallback is not yet available for models with external regressors (the augmented filter concentrates the regression coefficients and stores no per-observation innovations); those keep the improved Hessian. * Decoupled fit and forecast. A fitted `pts` object now caches the terminal state (final state, its covariance, the innovation variance, and the augmented-KF state for regressor models), so `forecast()` / `predict()` reuse it instead of re-filtering the whole series on every call -- O(h) instead of O(n*m^3). Forecasting a high-lag model is now effectively instant after the fit (~9x faster even at modest lags, more at high lags). Explanatory variables are handled adam-style: the auto-forecast produced when `pts(..., h > 0, holdout = TRUE)` now uses the held-out regressor rows (previously it was dropped and the forecast came back empty), and `forecast(model, h, newdata = ...)` supplies future regressors for horizons beyond the holdout. This also fixes a latent inconsistency: `pts()$forecast` and `forecast(pts(...))` previously disagreed slightly (the in-fit forecast used the estimation-device concentrated state; the forecast method re-filtered in absolute scale). They are now identical, and holdout accuracy uses the same canonical forecast. * Performance: high-lag models (e.g. `lags = 336`) are dramatically faster. The state-space engine now exploits the sparsity of the transition matrix (block-diagonal rotation / companion blocks, ~98% zeros) in the Kalman filter, the analytic gradient, and the smoother: - forward filter `T P T'` uses sparse `T` (O(m^2) vs O(m^3)); - the analytic-gradient backward recursion uses sparse `T` plus a rank-1 expansion of the `Lt = I - K Z` term (O(m^2)); - the state smoother (`components()`) skips the O(m^3) backward covariance recursion entirely, since its smoothed-variance output was never surfaced (fast state smoother). Point forecasts, fitted values, components, coefficients, ICs and parameter standard errors are unchanged to floating-point tolerance. (The dense/sparse Hessian now agrees because the covariance uses central differences with an OPG fallback -- see the `vcov` entry above; previously the cruder Hessian could diverge on ill-conditioned models.) * `pts()` auto-lambda screen replaced. The previous Brent-on-LT-AIC search (which fitted ~20 proxy state-space models per series) is now a single classical decomposition + Guerrero coefficient-of- variation minimisation. Steps: 1. `smooth::msdecompose(y, lags = m, type = "additive", smoother = "ma")` — centred moving average of order m, plus an averaged seasonal. 2. Within each non-overlapping block of length m, take `mu_b = mean(smoothed trend)` and `sd_b = sd(y - trend)` — the within-block dispersion *retains* the seasonal swing since that swing's scaling with level is the signal we want Guerrero to detect. 3. Pick lambda minimising CV(`sd_b * mu_b^(lambda-1)`) via `optimize()` over `[0, 2]`. Clipping at 0 eliminates the `-1/lambda` asymptote of the inverse Box-Cox (the source of Inf forecasts under the old screen); upper 2 is the standard FPP generous cap. No more proxy structural fits during lambda screening — the screen is now ~50x faster while correctly identifying log territory for textbook multiplicative cases (AirPassengers) and pinning lambda at 1 for spike-contaminated series where the old Brent search drifted into the explosive negative-lambda region. * Default information criterion in `pts()` switched from `"AICc"` to `"BICc"`. AICc's complexity penalty can be too soft for short Box-Cox series: structurally aggressive shapes (e.g. a damped or local-linear trend on top of a small inverse-BC exponent) win the IC by 1-2 units and then produce runaway forecasts when the slope latches onto a terminal spike. BICc's stiffer log(n) penalty resolves the tie in favour of the simpler shape on the cases that matter without changing the pick on series where the structural shape is clearly supported. Specify `ic = "AICc"` to recover the old default. * G (deterministic) trend now correctly counts its drift slope as an estimated parameter for IC purposes. The slope is concentrated out as a regressor by the engine, so `parLabels()` did not push a "Slope" entry and `nParam` was one short on G models. Fixed in both the C++ ident loop (`kFor` in `PTSmodel.h::estim`) and the R-side selector / post-fit accounting (`R/pts.R`, `R/pts-translate.R`). This pushes G candidates down the IC ranking by one DoF; AIC values printed in `verbose = TRUE` runs change accordingly. * `pts()` checks for negative values in `data` up front: if any are present it warns and pins the Box-Cox lambda to 1 (no transformation). Previously the same input reached the C++ engine and threw a low-level error. Explicit `lambda = 1` is left alone. * Removed the `auto.pts()` wrapper. It was redundant with `pts(data, model = "ZZZ", ...)`, which is the documented way to request fully automatic model selection. * `pts()` accepts seasonal SARMA on the irregular component via per-lag vectors in `orders` — e.g. `orders = list(ar = c(p, P), ma = c(q, Q), lags = c(1, s))` fits SARMA(p, q)(P, Q)_s. The seasonal lag may also be supplied through the top-level `lags` argument as `c(1, s)` (adam-style). Coefficients show up as `SAR(i)` / `SMA(i)` in print/summary. The seasonal polynomial is multiplied internally; the BFGS only optimises the free φ_i, Φ_j, θ_i, Θ_j coefficients. * `orders$select = TRUE` now also works for seasonal specs: grid-searches the (p, q, P, Q) tuple up to the supplied caps and picks the lowest-IC candidate. * `orders$select = TRUE` uses a residual-based two-stage strategy: fit the structural PTS once (no ARMA), then rank ARMA candidates by IC on the BC-scale residuals via `stats::arima()`. Two state-space fits in total instead of one-per-candidate, with the same IC criterion as before. * `pts()` return list now carries `$arma = list(ar = ..., ma = ...)` — flat per-lag concatenated AR / MA coefficient vectors mirroring smooth::adam's convention. * `orders$i` is silently dropped — PTS has no differencing. The returned `$orders` slot and the `orders()` accessor no longer include an `i` field; `orders$i` in user input is ignored. * ARMA initial conditions in the engine now use asymmetric ACF/PACF seeds (PTSmodel.h:3662), with a tie-breaker that prevents the BFGS from drifting onto the φ_i == θ_i cancellation manifold. Previously zero init left the optimiser either trivially "converged" (AR == MA to bit precision on AirPassengers) or stuck at the iteration limit (synthetic AR(1) signals recovered as AR ≈ -0.02 instead of ≈ 0.7). Pure AR(1) recovery now matches stats::arima to 3-4 decimals. * `orders$select = TRUE` grid uses an in-engine ARMA / SARMA fitter `UCompARMAC` for every candidate (including the seasonal cells); no `stats::arima` call anywhere in R. IC ranking and the final fit share the same MLE σ̂² definition, the same PACF parameter space, and the same DoF count (k = 1 + free AR + free MA), so the no-ARMA baseline ranks fairly against any ARMA candidate. * `findUCmodels()` filters the (damped trend, AR > 0) combinations — `srw` and `dt` model persistence via their α parameter, and pairing them with an AR term joint-identifies poorly. MA-only ARMA stays allowed for those trends. Replaces the previous blanket strip with a per-combination filter. * ARMA initial conditions in the engine use asymmetric ACF / PACF seeds (`PTSmodel.h:3662`), with a tie-breaker that pushes MA off the φ_i == θ_i cancellation manifold whenever the two seeds collide. Pure AR(1) recovery now matches `stats::arima` to 3-4 decimals; ARMA(2, 2) on log(AirPassengers) no longer collapses to AR == MA. muse v0.0.1.41005 (in development) ======= User-facing * Single user-facing entry point: pts(). The legacy ctll() and PTS/PTSforecast/MSOE API have been removed. * pts() now mirrors smooth::adam()'s calling convention — data, formula, ic, orders, h, holdout, regressors — and returns an object with adam-aligned slot names so plot.smooth and other smooth/greybox methods just work. * orders accepts both the full list(ar, ma, select) and a numeric shortcut c(p, q) (or c(p)) — the irregular component's ARMA(p,q) is wired through to the C++ engine via either form. * forecast.pts() supports prediction, simulated, and confidence intervals; one- or two-sided bounds; cumulative forecasts; and level as a vector (producing matrix bounds). * New / updated S3 methods: print, summary, plot (actuals + reordered states), fitted, residuals, predict, coef, vcov, logLik, nobs, sigma, AIC/BIC/AICc/BICc, rstandard, rstudent, pointLik, outlierdummy, accuracy, simulate, confint, update, initials, modelType, lags, orders, errorType. * summary.pts and print.pts now use an adam-style header/footer and print variance proportions; the analytically concentrated variance gets an analytical SE and a "*" marker. Estimation / likelihood * Box-Cox normal log-likelihood is now consistent across BFGS, AIC/BIC computation, and profile-lambda search. σ̂² uses the MLE divisor n (not REML n − k), and the Box-Cox Jacobian is summed over the same observations — this removes a state-dim-proportional bias that pushed the auto-selected λ toward 1. On AirPassengers, λ now lands near -0.3, matching forecast::BoxCox.lambda. * Profile-lambda (Brent + anchor snap to {-2, -1, -0.5, 0, 0.5, 1, 2}) is run per candidate during model selection. λ counts as +1 DoF only when the optimised value beats the nearest anchor. * Box-Cox transform uses exact equality at λ = 0 / λ = 1 (not threshold shortcuts), so AIC is continuous in λ across those points. Internals * C++ side reorganised: musecore.h hosts the SEXP-free dispatch; PTSmodel.h owns BSMclass (matrices, likelihood, ident, components); SSpace.h owns the generic Kalman filter / smoother / optimiser. * Dedicated forecastOnly command on UCompC re-uses cached parameters without re-estimating. * Box-Cox Jacobian shared between R and C++ via bcnorm.h / .inv_box_cox(); identical branches on both sides. * ARCHITECTURE.md documents call graphs, data structures, coupling rules, and key invariants; kept in sync with the code. muse v0.0.1 (Release date: 2025-02-19) ======= Changes: * Created the package * Diego uploaded the code for the ctll() and pts() functions. Ivan started preparing the documentation. * The working version of ctll() and the related methods, such as fitted(), residuals(), forecast() etc. * Simulated interval and cumulative forecasts in ctll()