% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/overglm3.R
\name{zeroinf}
\alias{zeroinf}
\title{Zero-Inflated Regression Models to deal with Zero-Excess in Count Data}
\usage{
zeroinf(
  formula,
  data,
  subset,
  na.action = na.omit(),
  weights,
  family = "poi(log)",
  zero.link = c("logit", "probit", "cloglog", "cauchit", "log"),
  reltol = 1e-13,
  start = list(counts = NULL, zeros = NULL),
  ...
)
}
\arguments{
\item{formula}{a \code{Formula} expression of the form \code{response ~ x1 + x2 + ... | z1 + z2 + ...}, which is a symbolic description
of the linear predictors of the models to be fitted to \eqn{\mu} and \eqn{\pi}, respectively.    See \link{Formula} documentation.  If a
formula of the form \code{response ~ x1 + x2 + ...} is supplied, then the same regressors are employed in both components. This is equivalent to
\code{response ~ x1 + x2 + ...| x1 + x2 + ...}.}

\item{data}{an (optional) \code{data frame} in which to look for variables involved in the \code{formula} expression,
as well as for variables specified in the arguments \code{weights} and \code{subset}.}

\item{subset}{an (optional) vector specifying a subset of observations to be used in the fitting process.}

\item{na.action}{a function which indicates what should happen when the data contain NAs. By default \code{na.action} is set to be \code{na.omit()}.}

\item{weights}{an (optional) vector of positive "prior weights" to be used in the fitting process. The length of
\code{weights} should be the same as the number of observations. By default, \code{weights} is set to be a vector of 1s.}

\item{family}{an (optional) character string which allows to specify the distribution to describe the response variable, as well as the link
function to be used in the model for \eqn{\mu}. The following distributions are supported: (zero-inflated) negative binomial I ("nb1"),
(zero-inflated) negative binomial II ("nb2"), (zero-inflated) negative binomial ("nbf"), and (zero-inflated) poisson ("poi").
Link functions available are the same than those available in Poisson models via \link{glm}. See \link{family} documentation. By
default, \code{family} is set to be Poisson with log link.}

\item{zero.link}{an (optional) character string which allows to specify the link function to be used in the model for \eqn{\pi}.
Link functions available are the same than those available in binomial models via \link{glm}. See \link{family} documentation.
By default, \code{zero.link} is set to be "logit".}

\item{reltol}{an (optional) positive value which represents the \emph{relative convergence tolerance} for the BFGS method in \link{optim}.
By default, \code{reltol} is set to be 1e-13.}

\item{start}{an (optional) list with two components named "counts" and "zeros", which allows to specify the starting values to be used in the
iterative process to obtain the estimates of the parameters in the linear predictors to the models for \eqn{\mu}
and \eqn{\pi}, respectively.}

\item{...}{further arguments passed to or from other methods.}
}
\value{
An object of class  \emph{zeroinflation} in which the main results of the model fitted to the data are stored, i.e., a
list with components including
\tabular{ll}{
\code{coefficients} \tab a list with elements "counts" and "zeros" containing the parameter estimates\cr
                    \tab from the respective models,\cr
\tab \cr
\code{fitted.values}\tab a list with elements "counts" and "zeros" containing the estimates of \eqn{\mu_1,\ldots,\mu_n}\cr
                    \tab and \eqn{\pi_1,\ldots,\pi_n}, respectively,\cr
\tab \cr
\code{start}        \tab a vector containing the starting values for all parameters in the model,\cr
\tab \cr
\code{prior.weights}\tab a vector containing the case weights used,\cr
\tab \cr
\code{offset}       \tab a list with elements "counts" and "zeros" containing the offset vectors, if any, \cr
                    \tab from the respective models,\cr
\tab \cr
\code{terms}        \tab a list with elements "counts", "zeros" and "full" containing the terms objects for \cr
                    \tab the respective models,\cr
\tab \cr
\code{loglik}       \tab the value of the log-likelihood function avaliated at the parameter estimates and\cr
                    \tab the observed data,\cr
\tab \cr
\code{estfun}       \tab a list with elements "counts" and "zeros" containing the estimating functions \cr
                    \tab evaluated at the parameter estimates and the observed data for the respective models,\cr
\tab \cr
\code{formula}      \tab the formula,\cr
\tab \cr
\code{levels}       \tab the levels of the categorical regressors,\cr
\tab \cr
\code{contrasts}    \tab a list with elements "counts" and "zeros" containing the contrasts corresponding\cr
                    \tab to levels from the respective models,\cr
\tab \cr
\code{converged}    \tab a logical indicating successful convergence,\cr
\tab \cr
\code{model}        \tab the full model frame,\cr
\tab \cr
\code{y}            \tab the response count vector,\cr
\tab \cr
\code{family}       \tab a list with elements "counts" and "zeros" containing the \link{family} objects used\cr
                    \tab  in the respective models,\cr
\tab \cr
\code{linear.predictors} \tab  a list with elements "counts" and "zeros" containing the estimates of \cr
                         \tab  \eqn{g(\mu_1),\ldots,g(\mu_n)} and \eqn{h(\pi_1),\ldots,h(\pi_n)}, respectively,\cr
\tab \cr
\code{R}            \tab a matrix with the Cholesky decomposition of the inverse of the variance-covariance\cr
                    \tab matrix of all parameters in the model,\cr
\tab \cr
\code{call}         \tab the original function call.\cr
}
}
\description{
Allows to fit a zero-inflated (Poisson or negative binomial) regression model to deal with zero-excess in count data.
}
\details{
The zero-inflated count distributions may be obtained as the mixture between a count
distribution and the Bernoulli distribution. Indeed, if \eqn{Y} is a count random
variable such that \eqn{Y|\nu=1} is 0 with probability 1
and \eqn{Y|\nu=0} ~ Poisson\eqn{(\mu)}, where \eqn{\nu} ~ Bernoulli\eqn{(\pi)}, then
\eqn{Y} is distributed according to the Zero-Inflated Poisson distribution, denoted here as
ZIP\eqn{(\mu,\pi)}.

Similarly, if \eqn{Y} is a count random variable such that \eqn{Y|\nu=1} is 0 with probability 1
and \eqn{Y|\nu=0} ~ NB\eqn{(\mu,\phi,\tau)}, where \eqn{\nu} ~ Bernoulli\eqn{(\pi)}, then
\eqn{Y} is distributed according to the Zero-Inflated Negative Binomial distribution, denoted here as
ZINB\eqn{(\mu,\phi,\tau,\pi)}. The Zero-Inflated Negative Binomial I \eqn{(\mu,\phi,\pi)} and
Zero-Inflated Negative Binomial II \eqn{(\mu,\phi,\pi)} distributions are special cases of ZINB when
\eqn{\tau=0} and \eqn{\tau=-1}, respectively.

The "counts" model may be expressed as \eqn{g(\mu_i)=x_i^{\top}\beta} for \eqn{i=1,\ldots,n}, where
\eqn{g(\cdot)} is the link function specified at the argument \code{family}. Similarly, the "zeros" model may
be expressed as \eqn{h(\pi_i)=z_i^{\top}\gamma} for \eqn{i=1,\ldots,n}, where \eqn{h(\cdot)} is the
link function specified at the argument \code{zero.link}. The parameter estimation is performed by using the
maximum likelihood method. The model parameters are estimated by maximizing the log-likelihood
function using the BFGS method available in the routine \link{optim}. The accuracy and speed of the BFGS
method are increased because of the analytical instead of the numerical derivatives are used. The estimate
of the variance-covariance matrix is obtained as being minus the inverse of the (analytical) hessian matrix
evaluated at the parameter estimates and the observed data.

A set of standard extractor functions for fitted model objects is available for objects of class  \emph{zeroinflation},
including methods to the generic functions such as \link{print}, \link{summary}, \link{model.matrix}, \link{estequa},
\link{coef}, \link{vcov}, \link{logLik}, \link{fitted}, \link{confint}, \link{AIC}, \link{BIC} and
\link{predict}. In addition, the model fitted to the	data may be assessed using functions such as
\link{anova.zeroinflation}, \link{residuals.zeroinflation}, \link{dfbeta.zeroinflation},
\link{cooks.distance.zeroinflation} and \link{envelope.zeroinflation}.
}
\examples{
####### Example 1: Roots Produced by the Columnar Apple Cultivar Trajan
data(Trajan)
fit1 <- zeroinf(roots ~ photoperiod, family="nbf(log)", zero.link="logit", data=Trajan)
summary(fit1)

####### Example 2: Self diagnozed ear infections in swimmers
data(swimmers)
fit2 <- zeroinf(infections ~ frequency | location, family="nb1(log)", data=swimmers)
summary(fit2)

####### Example 3: Article production by graduate students in biochemistry PhD programs
bioChemists <- pscl::bioChemists
fit3 <- zeroinf(art ~ fem + kid5 + ment | ment, family="nb1(log)", data = bioChemists)
summary(fit3)

}
\references{
Cameron, A.C. and Trivedi, P.K. 1998. \emph{Regression Analysis of Count Data}. New York:
            Cambridge University Press.

Lambert, D. 1992. Zero-Inflated Poisson Regression, with an Application to Defects in
            Manufacturing. \emph{Technometrics} 34, 1-14.

Garay, A.M. and Hashimoto, E.M. and Ortega, E.M.M. and Lachos, V. 2011. On estimation and
            influence diagnostics for zero-inflated negative binomial regression models. \emph{Computational
			   Statistics & Data Analysis} 55, 1304-1318.
}
\seealso{
\link{overglm}, \link{zeroalt}
}
