% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/lm.rrpp.r
\name{lm.rrpp}
\alias{lm.rrpp}
\title{Linear Model Evaluation with a Randomized Residual Permutation Procedure}
\usage{
lm.rrpp(f1, iter = 999, seed = NULL, int.first = FALSE, RRPP = TRUE,
  SS.type = c("I", "II", "III"), data = NULL, Cov = NULL,
  print.progress = TRUE, Parallel = FALSE, ...)
}
\arguments{
\item{f1}{A formula for the linear model (e.g., y~x1+x2).  Can also be a linear model fit
from \code{\link{lm}}.}

\item{iter}{Number of iterations for significance testing}

\item{seed}{An optional argument for setting the seed for random permutations of the resampling procedure.
If left NULL (the default), the exact same P-values will be found for repeated runs of the analysis (with the same number of iterations).
If seed = "random", a random seed will be used, and P-values will vary.  One can also specify an integer for specific seed values,
which might be of interest for advanced users.}

\item{int.first}{A logical value to indicate if interactions of first main effects should precede subsequent main effects}

\item{RRPP}{A logical value indicating whether residual randomization should be used for significance testing}

\item{SS.type}{A choice between type I (sequential), type II (hierarchical), or type III (marginal)
sums of squares and cross-products computations.}

\item{data}{A data frame for the function environment, see \code{\link{rrpp.data.frame}}}

\item{Cov}{An optional argument for including a covariance matrix to address the non-independence
of error in the estimation of coefficients (via GLS).}

\item{print.progress}{A logical value to indicate whether a progress bar should be printed to the screen.
This is helpful for long-running analyses.}

\item{Parallel}{A logical value to indicate whether parallel processing should be used.  If TRUE, this argument
invokes forking of processor cores, using the \code{parallel} library.  This option is only available to unix systems
and should only be used for rather long analyses (that would normally take over 10 seconds on a single core).  Currently,
parallel processing is performed on all but one core with no option to change the number of cores.  Systems with Windows
platforms will automatically default to a single-core application of this function.}

\item{...}{Arguments typically used in \code{\link{lm}}, such as weights or offset, passed on to
\code{rrpp.fit} for estimation of coefficients.}
}
\value{
An object of class \code{lm.rrpp} is a list containing the following
\item{call}{The matched call.}
\item{LM}{Linear Model objects, including data (Y), coefficients, design matrix (X), sample size
(n), number of dependent variables (p), QR decomposition of the design matrix, fitted values, residuals,
weights, offset, model terms, model frame (data), random coefficients (through permutations),
random vector distances for coefficients (through permutations), and whether OLS or GLS was performed}
\item{ANOVA}{Analysis of variance objects, including the SS type, random SS outcomes, random MS outcomes,
random R-squared outcomes, random F outcomes, random Cohen's f-squared outcomes, P-values based on random F
outcomes, effect sizes for random outcomes, sample size (n), number of variables (p), and degrees of freedom for
model terms (df).  These objects are used to construct ANOVA tables.}
\item{PermInfo}{Permutation procedure information, including the number of permutations (perms), The method
of residual randomization (perm.method), and each permutation's sampling frame (perm.schedule), which
is a list of reordered sequences of 1:n, for how residuals were randomized.}
}
\description{
Function performs a linear model fit over many random permutations of data, using
a randomized residual permutation procedure.
}
\details{
The function fits a linear model using ordinary least squares (OLS) or generalized
least squares (GLS) estimation of coefficients over any number of random permutations of
the data.  A permutation procedure that randomizes vectors of residuals is employed.  This
procedure can randomize two types of residuals: residuals from null models or residuals from
an intercept model.  The latter is the same as randomizing full values, and is referred to as
as a full randomization permutation procedure (FRPP); the former uses the residuals from null
models, which are defined by the type of sums of squares and cross-products (SSCP) sought in an
analysis of variance (ANOVA), and is referred to as a randomized residual permutation procedure (RRPP).
Types I, II, and III SSCPs are supported.

Users define the SSCP type, the permutation procedure type, whether a covariance matrix is included
(GLS estimation), and a few arguments related to computations.  Analytical results comprise observed linear model
results (coefficients, fitted values, residuals, etc.), random sums of squares (SS) across permutation iterations,
and other parameters for performing ANOVA and other hypothesis tests, using
empirically-derived probability distributions.

\code{lm.rrpp} emphasizes estimation of effect sizes with standard deviates of observed statistics
from distributions of random outcomes.  When performing ANOVA, using the \code{\link{anova}} function,
the effect type (statistic choice) can be varied.  See \code{\link{anova.lm.rrpp}} for more details.  Please
recognize that the type of SS must be chosen prior to running \code{lm.rrpp} and not when applying \code{\link{anova}}
to the \code{lm.rrpp} fit, as design matrices for the linear model must be created first.  Therefore, SS.type
is an argument for \code{lm.rrpp} and effect.type is an argument for \code{\link{anova.lm.rrpp}}.
}
\examples{

# Examples use geometric morphometric data
# See the package, geomorph, for details about obtaining such data

data("PupfishHeads")
names(PupfishHeads)

# Head Size Analysis (Univariate)-------------------------------------------------------

# Note: lm.rrpp works best if one avoids functions within formulas
# Thus,

PupfishHeads$logHeadSize <- log(PupfishHeads$headSize)
names(PupfishHeads)

fit <- lm.rrpp(logHeadSize ~ sex + locality/year, SS.type = "I", data = PupfishHeads)
summary(fit)
anova(fit, effect.type = "F") # Maybe not most appropriate
anova(fit, effect.type = "Rsq") # Change effect type, but still not most appropriate

# Mixed-model approach (most appropriate, as year sampled is a random effect:

anova(fit, effect.type = "F", error = c("Residuals", "locality:year", "Residuals"))

# Change to Type III SS

fit <- lm.rrpp(logHeadSize ~ sex + locality/year, SS.type = "III", data = PupfishHeads)
summary(fit)
anova(fit, effect.type = "F", error = c("Residuals", "locality:year", "Residuals"))

# Coefficients Test

coef(fit, test = TRUE)

# Predictions (holding alternative effects constant)

sizeDF <- data.frame(sex = c("Female", "Male"))
rownames(sizeDF) <- c("Female", "Male")
sizePreds <- predict(fit, sizeDF)
summary(sizePreds)
plot(sizePreds)

# Diagnostics plots of residuals

plot(fit)

# Body Shape Analysis (Multivariate)----------------------------------------------------

data(Pupfish)
names(Pupfish)

# Note:

dim(Pupfish$coords) # highly multivariate!

Pupfish$logSize <- log(Pupfish$CS) # better to not have functions in formulas
names(Pupfish)

# Note: one should increase RRPP iterations but they are not used at all
# here for a fast example.  Generally, iter = 999 will take less
# than 1s for this example with a modern computer.

fit <- lm.rrpp(coords ~ logSize + Sex*Pop, SS.type = "I", 
data = Pupfish, print.progress = FALSE, iter = 0) 
summary(fit, formula = FALSE)
anova(fit) 
coef(fit, test = TRUE)

# Predictions (holding alternative effects constant)

shapeDF <- expand.grid(Sex = levels(Pupfish$Sex), Pop = levels(Pupfish$Pop))
rownames(shapeDF) <- paste(shapeDF$Sex, shapeDF$Pop, sep = ".")
shapeDF

shapePreds <- predict(fit, shapeDF)
summary(shapePreds)
summary(shapePreds, PC = TRUE)

# Plot prediction

plot(shapePreds, PC = TRUE)
plot(shapePreds, PC = TRUE, ellipse = TRUE)

# Diagnostics plots of residuals

plot(fit)

# PC-plot of fitted values

groups <- interaction(Pupfish$Sex, Pupfish$Pop)
plot(fit, type = "PC", pch = 19, col = as.numeric(groups))

# Regression-like plot

plot(fit, type = "regression", reg.type = "PredLine", 
    predictor = Pupfish$logSize, pch=19,
    col = as.numeric(groups))

# Body Shape Analysis (Distances)----------------------------------------------------

D <- dist(Pupfish$coords) # inter-observation distances
length(D)
Pupfish$D <- D

# Note: one should increase RRPP iterations but they are not used at all
# here for a fast example.  Generally, iter = 999 will take less
# than 1s for this example with a modern computer.

fitD <- lm.rrpp(D ~ logSize + Sex*Pop, SS.type = "I", 
data = Pupfish, print.progress = FALSE, iter = 0) 

# These should be the same:
summary(fitD, formula = FALSE)
summary(fit, formula = FALSE) 

# GLS Example (Univariate) ----------------------------------------------------------

data(PlethMorph)
fitOLS <- lm.rrpp(TailLength ~ SVL, data = PlethMorph)
fitGLS <- lm.rrpp(TailLength ~ SVL, data = PlethMorph, Cov = PlethMorph$PhyCov)

anova(fitOLS)
anova(fitGLS)

sizeDF <- data.frame(SVL = sort(PlethMorph$SVL))
plot(predict(fitOLS, sizeDF)) # Correlated error
plot(predict(fitGLS, sizeDF)) # Independent error

#' # GLS Example (Multivariate) ----------------------------------------------------------

Y <- as.matrix(cbind(PlethMorph$TailLength,
PlethMorph$HeadLength,
PlethMorph$TailLength,
PlethMorph$Snout.eye,
PlethMorph$BodyWidth,
PlethMorph$Forelimb,
PlethMorph$Hindlimb))
PlethMorph <- rrpp.data.frame(PlethMorph, Y=Y)
fitOLSm <- lm.rrpp(Y ~ SVL, data = PlethMorph)
fitGLSm <- lm.rrpp(Y ~ SVL, data = PlethMorph, Cov = PlethMorph$PhyCov)

anova(fitOLSm)
anova(fitGLSm)

plot(predict(fitOLSm, sizeDF), PC= TRUE) # Correlated error
plot(predict(fitGLSm, sizeDF), PC= TRUE) # Independent error
}
\references{
Anderson MJ. 2001. A new method for non-parametric multivariate analysis of variance.
   Austral Ecology 26: 32-46.

Anderson MJ. and C.J.F. terBraak. 2003. Permutation tests for multi-factorial analysis of variance.
   Journal of Statistical Computation and Simulation 73: 85-113.

Collyer, M.L., D.J. Sekora, and D.C. Adams. 2015. A method for analysis of phenotypic change for phenotypes described
by high-dimensional data. Heredity. 115:357-365.

Adams, D.C. and M.L. Collyer. 2016.  On the comparison of the strength of morphological integration across morphometric
datasets. Evolution. 70:2623-2631.
}
\seealso{
\code{procD.lm} and \code{procD.pgls} within \code{geomorph}; \code{\link[stats]{lm}} for more on linear model fits.
}
\author{
Michael Collyer
}
\keyword{analysis}
