--- title: "spefa: spatial stochastic frontier with fixed effects and endogeneity" author: "Massimo Giannini (University of Rome Tor Vergata, Department of Enterprise Engineering)" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{spefa: user guide} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = FALSE) ``` ## 1. What this package estimates `spefa` implements the maximum-likelihood estimator developed in > Giannini, M. (2025). *A spatial stochastic frontier model with fixed effects > and endogenous environmental variables.* **Spatial Economic Analysis**, > 20(3), 420–441. doi:10.1080/17421772.2024.2414962 The model is a stochastic production (or cost) frontier on panel data that **simultaneously** accounts for three features that the earlier literature treated separately: * **individual fixed effects** \(\alpha_i\), removed by first differencing (following Wang and Ho, 2010), which avoids the incidental-parameters problem and sidesteps the singularity of the within transformation in spatial panels; * **endogenous regressors and environmental variables**, corrected through a Gaussian control function obtained from a Cholesky factorisation of the error covariance (following Kutlu, Tran and Tsionas, 2020); * **spatial spillovers**, through a spatial-autoregressive (SAR) term \(\lambda W y\). For unit \(i\) and period \(t\) the frontier is $$ y_{it} = \alpha_i + \lambda \sum_j w_{ij} y_{jt} + x_{it}'\beta - u_{it} + v_{it}, \qquad u_{it} = h_{it}\,u_i^{*}, $$ with \(h_{it} = \exp(z_{it}'\phi) > 0\) a multiplicative scaling and a time-invariant \(u_i^{*} \sim N^{+}(\mu,\sigma_u^2)\) (half-normal when \(\mu = 0\)). Endogenous variables \(q_j\) obey a reduced form \(q_{j,it} = z_{j,it}'\delta_j + \varepsilon_{j,it}\), and endogeneity is the correlation \(\rho_j = \mathrm{corr}(\varepsilon_j, v)\). The full derivation of the likelihood is in Giannini (2025); the package implements it in closed form and maximises it numerically. The implementation depends only on base R (the `stats` package). ## 2. Installation ```{r} install.packages("spefa_0.1.0.tar.gz", repos = NULL, type = "source") library(spefa) ``` ## 3. The data you supply The estimator needs a **balanced** panel and a spatial weight matrix: * a `data.frame` in long format with a unit identifier and a time identifier; * the outcome `y` and the frontier regressors `X` (these may include endogenous variables); * for each endogenous variable, one or more **instruments**; * the variables that enter the inefficiency **scaling** \(h_{it}\); * an \(N \times N\) weight matrix `W`, with rows and columns ordered by the **sorted unit id**, and (typically) row-normalised. ## 4. The `spefa()` function and its options ```{r} spefa(formula, data, index, W, endogenous = NULL, scaling = NULL, mu = FALSE, control = list(maxit = 500, reltol = 1e-9)) ``` | Argument | Meaning | |---|---| | `formula` | Frontier equation, e.g. `y ~ x1 + q1`. The intercept is removed by first differencing, so it is irrelevant. Endogenous regressors are listed here like any other regressor. | | `data` | A balanced panel `data.frame`. | | `index` | Length-2 character vector `c(unit, time)`, e.g. `c("id","time")`. | | `W` | The \(N\times N\) spatial weight matrix (sorted-id order). | | `endogenous` | Named list mapping each endogenous variable to its instruments, e.g. `list(q1 = ~ z1, q2 = ~ z2 + z3)`. Use `NULL` for no endogeneity (the model is then a plain spatial SF with fixed effects). An endogenous variable may appear in the frontier, in the scaling, or both. | | `scaling` | One-sided formula of the variables entering \(h_{it}=\exp(z_{it}'\phi)\), e.g. `~ x2 + q2`. There is **no intercept** (it is absorbed into \(\sigma_u\)). Needs at least one time-varying term. | | `mu` | `FALSE` (default) gives a half-normal inefficiency (\(\mu=0\)); `TRUE` estimates the truncation point \(\mu\) (truncated-normal). | | `control` | List passed to `stats::optim` (`maxit`, `reltol`). | ## 5. What is returned, and the available methods `spefa()` returns an object of class `"spefa"`. The estimated parameter vector contains the frontier slopes \(\beta\), the instrument slopes \(\delta_j\), the scaling coefficients \(\phi\), the reduced-form variances \(\sigma^2_{\varepsilon_j}\), the endogeneity correlations \(\rho_j\), \(\sigma^2_v\), \(\sigma^2_u\), (optionally \(\mu\)) and the spatial parameter \(\lambda\). | Method | Returns | |---|---| | `summary(fit)` | Coefficient table with standard errors, \(z\)-statistics and \(p\)-values; log-likelihood, AIC, BIC; convergence code; \(\lambda\) and the endogeneity correlations \(\rho\). | | `coef(fit)`, `vcov(fit)` | Point estimates and the (delta-method) covariance matrix. | | `logLik(fit)`, `AIC(fit)`, `BIC(fit)`, `nobs(fit)` | Standard fit statistics. | | `efficiency(fit, spatial = TRUE)` | Conditional-mean (Jondrow et al., 1982) inefficiency, optionally rescaled by the spatial multiplier \((I-\lambda W)^{-1}\). Returns the \(N\times T\) inefficiency matrix `u`, the efficiency matrix `eff = exp(-u)`, and the per-unit `Eu_i`. | | `impacts(fit)` | Direct, indirect and total marginal effects (LeSage and Pace, 2009) of each frontier regressor through \((I-\lambda W)^{-1}\). | Standard errors come from the Hessian at the optimum (computed on the unconstrained scale) propagated to the natural parameters by the delta method. ## 6. A worked example A small demo panel (`spefademo`, \(N = 40\), \(T = 12\)) ships with the package and is used here. The data-generating values are \(\beta=(0.5,0.5)\), \(\delta=(1,1)\), \(\phi=(0.3,0.3)\), \(\sigma^2_\varepsilon=0.09\), \(\sigma^2_v=0.04\), \(\rho=(0.5,0.5)\), \(\sigma^2_u=0.09\), \(\lambda=0.5\); `q1` and `q2` are endogenous, `q1` enters the frontier and `q2` the scaling. ```{r} library(spefa) data(spefademo) fit <- spefa(y ~ x1 + q1, data = spefademo, index = c("id", "time"), W = spefaW, endogenous = list(q1 = ~ z1, q2 = ~ z2), scaling = ~ x2 + q2) summary(fit) ``` Indicative output from a larger sample (\(N=150\), to show parameter recovery; the bundled \(N=40\) panel gives the same point estimates with wider standard errors): ``` Estimate Std.Error z value Pr(>|z|) x1 0.5002 0.0027 182.2 <2e-16 *** q1 0.5007 0.0032 155.2 <2e-16 *** delta.q1.z1 0.9991 0.0056 179.8 <2e-16 *** delta.q2.z2 1.0029 0.0054 186.6 <2e-16 *** x2 0.2995 0.0120 24.9 <2e-16 *** q2 0.3051 0.0120 25.5 <2e-16 *** rho.q1 0.4763 0.0134 35.5 <2e-16 *** rho.q2 0.5095 0.0130 39.2 <2e-16 *** sigma2_v 0.0391 0.0010 38.5 <2e-16 *** sigma2_u 0.0879 0.0162 5.4 5e-08 *** lambda 0.5027 0.0044 114.5 <2e-16 *** Spatial parameter lambda = 0.5027 Endogeneity correlations rho (q1, q2): 0.476, 0.509 ``` ```{r} impacts(fit) # direct / indirect / total marginal effects ef <- efficiency(fit) # spatially-corrected technical efficiency summary(as.vector(ef$eff)) ``` ## 7. Reading the results * **Endogeneity.** A \(\rho_j\) statistically different from zero signals endogeneity of variable \(j\); the \(z\)-test in the coefficient table is a direct test of \(H_0:\rho_j=0\). When all \(\rho_j=0\) the control-function correction vanishes and the estimator collapses to the exogenous case. * **Spatial dependence.** \(\lambda\) measures the strength of spillovers; the `impacts()` table splits each regressor's effect into a direct (own-unit) and an indirect (spillover) component, with total \(\approx \beta_k/(1-\lambda)\) under row-normalised `W`. * **Efficiency.** `efficiency(fit)$eff` gives unit-by-period technical efficiency in \((0,1]\); with `spatial = TRUE` the scores incorporate neighbours' inefficiency through \((I-\lambda W)^{-1}\). ## 8. Practical notes and identification * The panel must be **balanced**, and the rows/columns of `W` must follow the sorted unit id used in `data`. * The inefficiency parameters \(\phi\) and \(\sigma_u\) are identified only when the **scaling variables vary over time** (otherwise first differencing removes the inefficiency) and when the inefficiency is non-negligible. If the inefficiency is very small, expect \(\phi\) and \(\sigma_u\) to be weakly identified. * The endogeneity correction maintains the usual control-function assumptions: joint normality of the structural and reduced-form errors, and inefficiency independent of the regressors. * The likelihood evaluates the reduced-form (instrument) density on the data and counts the Gaussian normalising constant once per unit; the truncated-normal terms use the log-CDF for numerical stability. ## References Giannini, M. (2025). A spatial stochastic frontier model with fixed effects and endogenous environmental variables. *Spatial Economic Analysis*, 20(3), 420–441. Jondrow, J., Lovell, C. A. K., Materov, I. S., & Schmidt, P. (1982). On the estimation of technical inefficiency in the stochastic frontier production function model. *Journal of Econometrics*, 19(2–3), 233–238. Kutlu, L., Tran, K. C., & Tsionas, M. G. (2020). A spatial stochastic frontier model with endogenous frontier and environmental variables. *European Journal of Operational Research*, 286(1), 389–399. LeSage, J., & Pace, R. K. (2009). *Introduction to Spatial Econometrics*. CRC Press. Wang, H.-J., & Ho, C.-W. (2010). Estimating fixed-effect panel stochastic frontier models by model transformation. *Journal of Econometrics*, 157(2), 286–296.