% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/formulas.R
\name{ml_prepare_response_features_intercept}
\alias{ml_prepare_features}
\alias{ml_prepare_inputs}
\alias{ml_prepare_response_features_intercept}
\title{Pre-process the Inputs to a Spark ML Routine}
\usage{
ml_prepare_response_features_intercept(x = NULL, response, features,
  intercept, envir = parent.frame(),
  categorical.transformations = new.env(parent = emptyenv()))

ml_prepare_features(x, features, envir = parent.frame())
}
\arguments{
\item{x}{An object coercable to a Spark DataFrame (typically, a
\code{tbl_spark}).}

\item{response}{The name of the response vector (as a length-one character
vector), or a formula, giving a symbolic description of the model to be
fitted. When \code{response} is a formula, it is used in preference to other
parameters to set the \code{response}, \code{features}, and \code{intercept}
parameters (if available). Currently, only simple linear combinations of
existing parameters is supposed; e.g. \code{response ~ feature1 + feature2 + ...}.
The intercept term can be omitted by using \code{- 1} in the model fit.}

\item{features}{The name of features (terms) to use for the model fit.}

\item{intercept}{Boolean; should the model be fit with an intercept term?}

\item{envir}{The \R environment in which the \code{response}, \code{features}
and \code{intercept} bindings should be mutated. (Typically, the parent frame).}

\item{categorical.transformations}{An \R environment used to record what
categorical variables were binarized in this procedure. Categorical
variables that included in the model formula will be transformed into
binary variables, and the generated mappings will be stored in this
environment.}
}
\description{
Pre-process / normalize the inputs typically passed to a
Spark ML routine.
}
\details{
Pre-processing of these inputs typically involves:

\enumerate{
\item Handling the case where \code{response} is itself a formula
      describing the model to be fit, thereby extracting the names
      of the \code{response} and \code{features} to be used,
\item Splitting categorical features into dummy variables (so they
      can easily be accommodated + specified in the underlying
      Spark ML model fit),
\item Mutating the associated variables \emph{in the specified environment}.
}

Please take heed of the last point, as while this is useful in practice,
the behavior will be very surprising if you are not expecting it.
}
\examples{
\dontrun{
# note that ml_prepare_features, by default, mutates the 'features'
# binding in the same environment in which the function was called
local({
   ml_prepare_features(features = ~ x1 + x2 + x3)
   print(features) # c("x1", "x2", "x3")
})
}
}

