---
title: "Tweedie likelihood computations"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Tweedie likelihood computations}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(tweedie)
```

## Tweedie distributions

Tweedie distributions are exponential dispersion models, with a mean $\mu$ and a variance $\phi \mu^\xi$, for some dispersion parameter $\phi > 0$ and a power index $\xi$ (sometimes called $p$) that uniquely defines the distribution within the Tweedie family (for all real values of $\xi$ **not** between 0 and 1).

Special cases of the Tweedie distributions are:

* the *normal* distribution, with $\xi = 0$ (i.e., the variance is $\phi$ and not related to the mean);
* the *Poisson* distribution, with $\xi = 1$ and $\phi = 1$ (i.e., the variance is the same as the mean);
* the *gamma* distribution, with $\xi = 2$; and
* the *inverse Gaussian* distribution, with $\xi = 3$.

For all other values of $\xi$, the probability functions and distribution functions have no closed forms.


| Power index, $\xi$    | Distribution     | Support                       |
|-----------------------|------------------|-------------------------------|
| $\xi = 0$             | Normal           | $(-\infty, \infty)$           |
| $0 < \xi < 1$         | Do not exist     |                               |
| $\xi = 1$, $\phi = 1$ | Poisson          | Discrete: $(0, 1, 2, \dots)$  |
| $1 < \xi < 2$         | Poisson--gamma   | $[ 0, \infty)$                |
| $\xi = 2$             | gamma            | $(0, \infty)$                 |
| $\xi > 2$             | Skewed right     | $(0, \infty$)                 |
| $\xi = 3$             | inverse Gaussian | $(0, \infty)$                 |

For $\xi < 1$, applications are limited (non-existent so far?), but have support on the entire real line and $\mu > 0$.

For $1 < \xi < 2$, Tweedie distributions can be represented as a Poisson sum of gamma distributions. 
These distributions are continuous for $Y > 0$ but have a discrete mass at $Y = 0$.


## Examples

Likelihood computations are not need to fit a Tweedie generalized linear model, using the `tweedie()` family from the [**statmod** package](https://CRAN.R-project.org/package=statmod):

```{r TWexample}
library(statmod)

set.seed(96)
N <- 25
# Mean of the Poisson (lambda) and Gamma (shape/scale)
lambda <- 1.5
# Generating Compound Poisson-Gamma data manually
y <- replicate(N, {
  n_events <- rpois(1, lambda = lambda)
  if (n_events == 0) 0 else sum(rgamma(n_events, shape = 2, scale = 1))
})

mod.tw <- glm(y ~ 1, 
              family = statmod::tweedie(var.power = 1.5, link.power = 0) )
              # link.power = 0  means the log-link
```

However, likelihood computations are necessary in other situations.

### Generating random numbers

Random numbers from a Tweedie distribution are generated using **rtweedie()**:

```{r TWrandom}
tweedie::rtweedie(10, xi = 1.1, mu = 2, phi = 1)
```

### Plotting density and probability functions

Accurate density functions are generated using **dtweedie()**:


```{r TWplotsPDF}
y <- seq(0, 2, length = 100)
xi <- 1.1
mu <- 0.5
phi <- 0.4

twden <- tweedie::dtweedie(y, xi = xi, mu = mu, phi = phi)  
twdtn <- tweedie::ptweedie(y, xi = xi, mu = mu, phi = phi)

plot( twden[y > 0] ~ y[y > 0], 
      type ="l",
      lwd = 2,
      xlab = expression(italic(y)),
      ylab = "Density function")
points(twden[y==0] ~ y[y == 0],
      lwd = 2,
      pch = 19,
      xlab = expression(italic(y)),
      ylab = "Distribution function")
```

Accurate probability functions are generated using **ptweedie()**:

```{r TWplotsCDF}
plot(twdtn ~ y,
     type = "l",
      lwd = 2,
     ylim = c(0, 1),
      xlab = expression(italic(y)),
      ylab = "Distribution function")
```

However, the function **tweedie_plot()** is sometimes more convenient (especially for $1 < \xi \ < 2$, when a probability mass at $Y = 0$ is present):

```{r TWplotsPDF2}
tweedie::tweedie_plot(y, xi = xi, mu = mu, phi = phi,
                      ylab = "Density function",
                      xlab = expression(italic(y)),
                      lwd = 2)
```

```{r TWplotsCDF2}
tweedie::tweedie_plot(y, xi = xi, mu = mu, phi = phi, 
                      ylab = "Distribution function",
                      xlab = expression(italic(y)),
                      lwd = 2, 
                      ylim = c(0, 1), 
                      type = "cdf")
```

### Computing the quantile residuals:

Quantile residuals can be computed to assess the fit of a glm:

```{r TWqqplot}
library(tweedie)

qqnorm( statmod::qresid(mod.tw) )
```


# Estimating $\xi$

To use a Tweedie distribution in a glm, the value of $\xi$ is needed.
To find the most suitable value of $\xi$, **tweedie_profile()** can be used:

```{r TWprofile}
out <- tweedie::tweedie.profile(y ~ 1, 
                                xi.vec = seq(1.2, 1.8, by = 0.05), 
                                do.plot = TRUE)

# The estimated power index:
out$xi.max
```