Help for package sommer

Type:

Package

Title:

Solving Mixed Model Equations in R

Version:

4.4.4

Date:

2025-11-19

Maintainer:

Giovanny Covarrubias-Pazaran <cova_ruber@live.com.mx>

Description:

Structural multivariate-univariate linear mixed model solver for estimation of multiple random effects with unknown variance-covariance structures (e.g., heterogeneous and unstructured) and known covariance among levels of random effects (e.g., pedigree and genomic relationship matrices) (Covarrubias-Pazaran, 2016 <doi:10.1371/journal.pone.0156744>; Maier et al., 2015 <doi:10.1016/j.ajhg.2014.12.006>; Jensen et al., 1997). REML estimates can be obtained using the Direct-Inversion Newton-Raphson and Direct-Inversion Average Information algorithms for the problems r x r (r being the number of records) or using the Henderson-based average information algorithm for the problem c x c (c being the number of coefficients to estimate). Spatial models can also be fitted using the two-dimensional spline functionality available.

Depends:

R (≥ 3.5.0), Matrix (≥ 1.1.1), methods, stats, MASS, crayon, enhancer

LazyLoad:

yes

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Imports:

Rcpp (≥ 0.12.19)

BugReports:

https://github.com/covaruber/sommer/issues

URL:

https://github.com/covaruber/sommer

LinkingTo:

Rcpp, RcppArmadillo, RcppProgress

Suggests:

rmarkdown, knitr, plyr, parallel, orthopolynom, RSpectra, lattice, testthat (≥ 3.0.0)

VignetteBuilder:

knitr

Config/testthat/edition:

NeedsCompilation:

yes

Packaged:

2025-11-19 16:16:34 UTC; giovannycovarrubias

Author:

Giovanny Covarrubias-Pazaran

[aut, cre]

Repository:

CRAN

Date/Publication:

2025-11-26 06:20:07 UTC

Additive relationship matrix

Description

Calculates the realized additive relationship matrix. Currently is the C++ implementation of van Raden (2008).

Usage

A.mat(X,min.MAF=0,return.imputed=FALSE)

Arguments

X

Matrix (n \times m) of unphased genotypes for n lines and m biallelic markers, coded as {-1,0,1}. Fractional (imputed) and missing values (NA) are allowed.

min.MAF

Minimum minor allele frequency. The A matrix is not sensitive to rare alleles, so by default only monomorphic markers are removed.

return.imputed

When TRUE, the imputed marker matrix is returned.

Details

For vanraden method: the marker matrix is centered by subtracting column means M= X - ms where ms is the coumn means. Then A=M M'/c, where c = \sum_k{d_k}/k, the mean value of the diagonal values of the M M' portion.

Value

If return.imputed = FALSE, the n \times n additive relationship matrix is returned.

If return.imputed = TRUE, the function returns a list containing

$A: the A matrix
$X: the imputed marker matrix

References

Endelman, J.B., and J.-L. Jannink. 2012. Shrinkage estimation of the realized relationship matrix. G3:Genes, Genomes, Genetics. 2:1405-1413. doi: 10.1534/g3.112.004259

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Examples

####=========================================####
#### random population of 200 lines with 1000 markers
####=========================================####
X <- matrix(rep(0,200*1000),200,1000)
for (i in 1:200) {
  X[i,] <- ifelse(runif(1000)<0.5,-1,1)
}

A <- A.mat(X)

####=========================================####
#### take a look at the Genomic relationship matrix 
#### (just a small part)
####=========================================####
# colfunc <- colorRampPalette(c("steelblue4","springgreen","yellow"))
# hv <- heatmap(A[1:15,1:15], col = colfunc(100),Colv = "Rowv")
# str(hv)

Autocorrelation matrix of order 1.

Description

Creates an autocorrelation matrix of order one with parameters specified.

Usage

AR1(x,rho=0.25)

Arguments

x

vector of the variable to define the factor levels for the AR1 covariance structure.

rho

rho value for the matrix.

Details

Specially useful for constructing covariance structures for rows and ranges to capture better the spatial variation trends in the field. The rho value is assumed fixed and values of the variance component will be optimized through REML.

Value

If everything is defined correctly the function returns:

$nn: the correlation matrix

References

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Examples

x <- 1:4
R1 <- AR1(x,rho=.25)
image(R1)

data(DT_sleepstudy, package="enhancer")
DT <- DT_sleepstudy
head(DT)

# define the correlation between Days
D = with(DT, AR1(Days, rho=0.5))
subs = unique(DT$Subject)
# define the correlation between Subjects
S = diag(length(subs))
rownames(S) <- colnames(S) <- subs
# make the kronecker product
DS = kronecker(D,S, make.dimnames = TRUE)
# form the covariance matrix between units 
# this is assumes correlation between timepoints
DT$ds <- paste(DT$Days, DT$Subject, sep=":")
DS <- DS[DT$ds, DT$ds]
colnames(DS) <- rownames(DS) <- paste0("u",1:nrow(DS))
# fit the residual model
head(DT)

fm2 <- mmes(Reaction ~ Days,
            random= ~ Subject, 
            rcov = ~vsm(ism(units), Gu=DS), # equivalent to Subject:ar1(Days)
            data=DT, tolParInv = 1e-6, verbose = FALSE)
summary(fm2)$varcomp

# the matrix D can take any form: AR1, ARMA, or a custom correlation matrix

Autocorrelation Moving average.

Description

Creates an ARMA matrix of order one with parameters specified.

Usage

ARMA(x, rho=0.25, lambda=0.25)

Arguments

x

vector of the variable to define the factor levels for the ARMA covariance structure.

rho

rho value for the matrix.

lambda

dimensions of the square matrix.

Details

Value

If everything is defined correctly the function returns:

$nn: the correlation matrix

References

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Examples

x <- 1:4
R1 <- ARMA(x,rho=.25,lambda=0.2)
image(R1)

Compound symmetry matrix

Description

Creates a compound symmetry matrix with parameters specified.

Usage

CS(x, rho=0.25)

Arguments

x

vector of the variable to define the factor levels for the ARMA covariance structure.

rho

rho value for the matrix.

Details

Value

If everything is defined correctly the function returns:

$nn: the correlation matrix

References

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Examples

x <- 1:4
R1 <- CS(x,rho=.25)
image(R1)

Dominance relationship matrix

Description

C++ implementation of the dominance matrix. Calculates the realized dominance relationship matrix. Can help to increase the prediction accuracy when 2 conditions are met; 1) The trait has intermediate to high heritability, 2) The population contains a big number of individuals that are half or full sibs (HS & FS).

Usage

D.mat(X,nishio=TRUE,min.MAF=0,return.imputed=FALSE)

Arguments

X

Matrix (n \times m) of unphased genotypes for n lines and m biallelic markers, coded as {-1,0,1}. Fractional (imputed) and missing values (NA) are allowed.

nishio

If TRUE Nishio ans Satoh. (2014), otherwise Su et al. (2012). See references.

min.MAF

Minimum minor allele frequency. The D matrix is not sensitive to rare alleles, so by default only monomorphic markers are removed.

return.imputed

When TRUE, the imputed marker matrix is returned.

Details

The additive marker coefficients will be used to compute dominance coefficients as: Xd = 1-abs(X) for diploids.

For nishio method: the marker matrix is centered by subtracting column means M= Xd - ms where ms is the column means. Then A=M M'/c, where c = 2 \sum_k {p_k (1-p_k)}.

For su method: the marker matrix is normalized by subtracting row means M= Xd - 2pq where 2pq is the product of allele frequencies times 2. Then A=M M'/c, where c = 2 \sum_k {2pq_k (1-2pq_k)}.

Value

If return.imputed = FALSE, the n \times n additive relationship matrix is returned.

If return.imputed = TRUE, the function returns a list containing

$D: the D matrix
$imputed: the imputed marker matrix

References

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Nishio M and Satoh M. 2014. Including Dominance Effects in the Genomic BLUP Method for Genomic Evaluation. Plos One 9(1), doi:10.1371/journal.pone.0085792

Su G, Christensen OF, Ostersen T, Henryon M, Lund MS. 2012. Estimating Additive and Non-Additive Genetic Variances and Predicting Genetic Merits Using Genome-Wide Dense Single Nucleotide Polymorphism Markers. PLoS ONE 7(9): e45293. doi:10.1371/journal.pone.0045293

Examples

####=========================================####
#### EXAMPLE 1
####=========================================####
####random population of 200 lines with 1000 markers
X <- matrix(rep(0,200*1000),200,1000)
for (i in 1:200) {
  X[i,] <- sample(c(-1,0,0,1), size=1000, replace=TRUE)
}

D <- D.mat(X)

Epistatic relationship matrix

Description

Calculates the realized epistatic relationship matrix of second order (additive x additive, additive x dominance, or dominance x dominance) using hadamard products with the C++ Armadillo library.

Usage

E.mat(X,nishio=TRUE,type="A#A",min.MAF=0.02)

Arguments

X

Matrix (n \times m) of unphased genotypes for n lines and m biallelic markers, coded as {-1,0,1}. Fractional (imputed) and missing values (NA) are allowed.

nishio

If TRUE Nishio ans Satoh. (2014), otherwise Su et al. (2012) (see Details in the D.mat help page).

type

An argument specifying the type of epistatic relationship matrix desired. The default is the second order epistasis (additive x additive) type="A#A". Other options are additive x dominant (type="A#D"), or dominant by dominant (type="D#D").

min.MAF

Minimum minor allele frequency. The A matrix is not sensitive to rare alleles, so by default only monomorphic markers are removed.

Details

it is computed as the Hadamard product of the epistatic relationship matrix; E=A#A, E=A#D, E=D#D.

Value

The epistatic relationship matrix is returned.

References

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Endelman, J.B., and J.-L. Jannink. 2012. Shrinkage estimation of the realized relationship matrix. G3:Genes, Genomes, Genetics. 2:1405-1413. doi: 10.1534/g3.112.004259

Nishio M and Satoh M. 2014. Including Dominance Effects in the Genomic BLUP Method for Genomic Evaluation. Plos One 9(1), doi:10.1371/journal.pone.0085792

Examples

####=========================================####
####random population of 200 lines with 1000 markers
####=========================================####
X <- matrix(rep(0,200*1000),200,1000)
for (i in 1:200) {
  X[i,] <- sample(c(-1,0,0,1), size=1000, replace=TRUE)
}

E <- E.mat(X, type="A#A") 
# if heterozygote markers are present can be used "A#D" or "D#D"

Genome wide association study analysis

Description

THIS FUNCTION IS DEPRECATED. Fits a multivariate/univariate linear mixed model GWAS by likelihood methods (REML), see the Details section below. It uses the mmer function and its core coded in C++ using the Armadillo library to optimize dense matrix operations common in the derect-inversion algorithms. After the model fit extracts the inverse of the phenotypic variance matrix to perform the association test for the "p" markers. Please check the Details section (Model enabled) if you have any issue with making the function run.

The sommer package is updated on CRAN every 3-months due to CRAN policies but you can find the latest source at https://github.com/covaruber/sommer . This can be easily installed typing the following in the R console:

library(devtools)

install_github("covaruber/sommer")

This is recommended since bugs fixes will be immediately available in the GitHub source. For tutorials on how to perform different analysis with sommer please look at the vignettes by typing in the terminal:

vignette("v1.sommer.quick.start")

vignette("v2.sommer.changes.and.faqs")

vignette("v3.sommer.qg")

vignette("v4.sommer.gxe")

or visit https://covaruber.github.io

Usage


GWAS(fixed, random, rcov, data, weights, W,
    nIters=20, tolParConvLL = 1e-03, tolParInv = 1e-06, 
    init=NULL, constraints=NULL,method="NR", 
    getPEV=TRUE,naMethodX="exclude",
    naMethodY="exclude",returnParam=FALSE, 
    dateWarning=TRUE,date.warning=TRUE,verbose=FALSE,
    stepWeight=NULL, emWeight=NULL,
    M=NULL, gTerm=NULL, n.PC = 0, min.MAF = 0.05, 
    P3D = TRUE)

Arguments

fixed

A formula specifying the response variable(s) and fixed effects, i.e:

response ~ covariate for univariate models

cbind(response.i,response.j) ~ covariate for multivariate models

The fcm function can be used to constrain fixed effects in multi-response models.

random

a formula specifying the name of the random effects, i.e. random= ~ genotype + year.

Useful functions can be used to fit heterogeneous variances and other special models (see 'Special Functions' in the Details section for more information):

vsr(...,Gu,Gt,Gtc) is the main function to specify variance models and special structures for random effects. On the ... argument you provide the unknown variance-covariance structures (i.e. usr,dsr,at,csr) and the random effect where such covariance structure will be used (the random effect of interest). Gu is used to provide known covariance matrices among the levels of the random effect, Gt initial values and Gtc for constraints. Auxiliar functions for building the variance models are:

rcov

a formula specifying the name of the error term, i.e. rcov= ~ units.

The functions that can be used to fit heterogeneous residual variances are the same used on the random term but the random effect is always "units", i.e. rcov=~vsr(dsr(Location),units)

data

a data frame containing the variables specified in the formulas for response, fixed, and random effects.

weights

name of the covariate for weights. To be used for the product R = Wsi*R*Wsi, where * is the matrix product, Wsi is the square root of the inverse of W and R is the residual matrix.

W

Alternatively, instead of providing a vector of weights the user can specify an entire W matrix (e.g., when covariances exist). To be used first to produce Wis = solve(chol(W)), and then calculate R = Wsi*R*Wsi.t(), where * is the matrix product, and R is the residual matrix. Only one of the arguments weights or W should be used. If both are indicated W will be given the preference.

nIters

Maximum number of iterations allowed. Default value is 15.

tolParConvLL

Convergence criteria.

tolParInv

tolerance parameter for matrix inverse used when singularities are encountered.

init

initial values for the variance components. By default this is NULL and variance components are estimated by the method selected, but in case the user want to provide initial values for ALL var-cov components this argument is functional. It has to be provided as a list or an array, where each list element is one variance component and if multitrait model is pursued each element of the list is a matrix of variance covariance components among traits. Initial values can also be provided in the Gt argument of the vsr function.Is highly encouraged to use the Gt and Gtc arguments of the vsr function instead of this argument

constraints

when initial values are provided these have to be accompanied by their constraints. See the vsr function for more details on the constraints. Is highly encouraged to use the Gt and Gtc arguments of the vsr function instead of this argument.

method

this refers to the method or algorithm to be used for estimating variance components. Direct-inversion Newton-Raphson NR and Average Information AI (Tunnicliffe 1989; Gilmour et al. 1995; Lee et al. 2015).

getPEV

a TRUE/FALSE value indicating if the program should return the predicted error variance and variance for random effects. This option is provided since this can take a long time for certain models where p > n by a big extent.

naMethodX

one of the two possible values; "include" or "exclude". If "include" is selected then the function will impute the X matrices for fixed effects with the median value. If "exclude" is selected it will get rid of all rows with missing values for the X (fixed) covariates. The default is "exclude". The "include" option should be used carefully.

naMethodY

one of the three possible values; "include", "include2" or "exclude". If "include" is selected then the function will impute the response variables with the median value. The difference between "include" and "include2" is only available in the multitrait models when the imputation can happen for the entire matrix of responses or only for complete cases ("include2"). If "exclude" is selected it will get rid of rows in responses where missing values are present for the estimation of variance components. The default is "exclude".

returnParam

a TRUE/FALSE value to indicate if the program should return the parameters used for modeling without fitting the model.

dateWarning

a TRUE/FALSE value to indicate if the program should warn you when is time to update the sommer package.

date.warning

a TRUE/FALSE value to indicate if the program should warn you when is time to update the sommer package.

verbose

a TRUE/FALSE value to indicate if the program should return the progress of the iterative algorithm.

stepWeight

A vector of values (of length equal to the number of iterations) indicating the weight used to multiply the update (delta) for variance components at each iteration. If NULL the 1st iteration will be multiplied by 0.5, the 2nd by 0.7, and the rest by 0.9. This argument can help to avoid that variance components go outside the parameter space in the initial iterations which doesn't happen very often with the NR method but it can be detected by looking at the behavior of the likelihood. In that case you may want to give a smaller weight to the initial 8-10 iterations.

emWeight

A vector of values (of length equal to the number of iterations) indicating with values between 0 and 1 the weight assigned to the EM information matrix. And the values 1 - emWeight will be applied to the NR/AI information matrix to produce a joint information matrix. If NULL weights for EM information matrix are zero and 1 for the NR/AI information matrix.

M

The marker matrix containing the marker scores for each level of the random effect selected in the gTerm argument, coded as numeric based on the number of reference alleles in the genotype call, e.g. (-1,0,1) = (aa,Aa,AA), levels in diploid individuals. Individuals in rows and markers in columns. No additional columns should be provided, is a purely numerical matrix. Similar logic applies to polyploid individuals, e.g. (-3,-2,-1,0,1,2,3) = (aaaa,aaaA,aaAA,Aaaa,AAaa,AAAa,AAAA).

gTerm

a character vector indicating the random effect linked to the marker matrix M (i.e. the genetic term) in the model. The random effect selected should have the same number of levels than the number of rows of M. When fitting only a random effect without a special covariance structure (e.g., dsr, usr, etc.) you will need to add the call 'u:' to the name of the random effect given the behavior of the naming rules of the solver when having a simple random effect without covariance structure.

n.PC

Number of principal components to include as fixed effects. Default is 0 (equals K model).

min.MAF

Specifies the minimum minor allele frequency (MAF). If a marker has a MAF less than min.MAF, it is assigned a zero score.

P3D

When P3D=TRUE, variance components are estimated by REML only once, without any markers in the model and then a for loop for hypothesis testing is performed. When P3D=FALSE, variance components are estimated by REML for each marker separately. The latter can be quite time consuming. As many models will be run as number of marker.

Details

Citation

Type citation("sommer") to know how to cite the sommer package in your publications.

Models Enabled

For details about the models enabled and more information about the covariance structures please check the help page of the package (sommer). In general the GWAS model implemented in sommer to obtain marker effect is a generalized linear model of the form:

b = (X'V-X)X'V-y

with X = ZMi

where: b is the marker effect (dimensions 1 x mt) y is the response variable (univariate or multivariate) (dimensions 1 x nt) V- is the inverse of the phenotypic variance matrix (dimensions nt x nt) Z is the incidence matrix for the random effect selected (gTerm argument) to perform the GWAS (dimensions nt x ut) Mi is the ith column of the marker matrix (M argument) (dimensions u x m)

for t traits, n observations, m markers and u levels of the random effect. Depending if P3D is TRUE or FALSE the V- matrix will be calculated once and used for all marker tests (P3D=TRUE) or estimated through REML for each marker (P3D=FALSE).

vignette('sommer.start')

Bug report and contact

If you have any technical questions or suggestions please post it in https://stackoverflow.com or https://stats.stackexchange.com.

If you have any bug report please go to https://github.com/covaruber/sommer or send me an email to address it asap.

Value

If all parameters are correctly indicated the program will return a list with the following information:

Vi

the inverse of the phenotypic variance matrix V^- = (ZGZ+R)^-1

sigma

a list with the values of the variance-covariance components with one list element for each random effect.

sigma_scaled

a list with the values of the scaled variance-covariance components with one list element for each random effect.

sigmaSE

Hessian matrix containing the variance-covariance for the variance components. SE's can be obtained taking the square root of the diagonal values of the Hessian.

Beta

a data frame for trait BLUEs (fixed effects).

VarBeta

a variance-covariance matrix for trait BLUEs

U

a list (one element for each random effect) with a data frame for trait BLUPs.

VarU

a list (one element for each random effect) with the variance-covariance matrix for trait BLUPs.

PevU

a list (one element for each random effect) with the predicted error variance matrix for trait BLUPs.

fitted

Fitted values y.hat=XB

residuals

Residual values e = Y - XB

AIC

Akaike information criterion

BIC

Bayesian information criterion

convergence

a TRUE/FALSE statement indicating if the model converged.

monitor

The values of log-likelihood and variance-covariance components across iterations during the REML estimation.

scores

marker scores (-log_(10)p) for the traits

method

The method for extimation of variance components specified by the user.

constraints

contraints used in the mixed models for the random effects.

Author(s)

Giovanny Covarrubias-Pazaran

References

Covarrubias-Pazaran G. Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 2016, 11(6): doi:10.1371/journal.pone.0156744

Covarrubias-Pazaran G. 2018. Software update: Moving the R package sommer to multivariate mixed models for genome-assisted prediction. doi: https://doi.org/10.1101/354639

Bernardo Rex. 2010. Breeding for quantitative traits in plants. Second edition. Stemma Press. 390 pp.

Gilmour et al. 1995. Average Information REML: An efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51(4):1440-1450.

Kang et al. 2008. Efficient control of population structure in model organism association mapping. Genetics 178:1709-1723.

Lee, D.-J., Durban, M., and Eilers, P.H.C. (2013). Efficient two-dimensional smoothing with P-spline ANOVA mixed models and nested bases. Computational Statistics and Data Analysis, 61, 22 - 37.

Lee et al. 2015. MTG2: An efficient algorithm for multivariate linear mixed model analysis based on genomic information. Cold Spring Harbor. doi: http://dx.doi.org/10.1101/027201.

Maier et al. 2015. Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder. Am J Hum Genet; 96(2):283-294.

Rodriguez-Alvarez, Maria Xose, et al. Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statistics 23 (2018): 52-71.

Searle. 1993. Applying the EM algorithm to calculating ML and REML estimates of variance components. Paper invited for the 1993 American Statistical Association Meeting, San Francisco.

Yu et al. 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Genetics 38:203-208.

Tunnicliffe W. 1989. On the use of marginal likelihood in time series model estimation. JRSS 51(1):15-27.

Zhang et al. 2010. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42:355-360.

Examples


####=========================================####
#### For CRAN time limitations most lines in the 
#### examples are silenced with one '#' mark, 
#### remove them and run the examples using
#### command + shift + C |OR| control + shift + C
####=========================================####
#####========================================####
##### potato example
#####========================================####
# 
# data(DT_polyploid, package="enhancer")
# DT <- DT_polyploid
# GT <- GT_polyploid
# MP <- MP_polyploid
# ####=========================================####
# ####### convert markers to numeric format
# ####=========================================####
# numo <- atcg1234(data=GT, ploidy=4);
# numo$M[1:5,1:5];
# numo$ref.allele[,1:5]
# 
# ###=========================================####
# ###### plants with both genotypes and phenotypes
# ###=========================================####
# common <- intersect(DT$Name,rownames(numo$M))
# 
# ###=========================================####
# ### get the markers and phenotypes for such inds
# ###=========================================####
# marks <- numo$M[common,]; marks[1:5,1:5]
# DT2 <- DT[match(common,DT$Name),];
# DT2 <- as.data.frame(DT2)
# DT2[1:5,]
# 
# ###=========================================####
# ###### Additive relationship matrix, specify ploidy
# ###=========================================####
# A <- A.mat(marks)
# ###=========================================####
# ### run it as GWAS model
# ###=========================================####
# ans2 <- GWAS(tuber_shape~1,
#              random=~vsr(Name,Gu=A),
#              rcov=~units,
#              gTerm = "u:Name",
#              M=marks, data=DT2)
# plot(ans2$scores[,1])

Combined relationship matrix H

Description

Given a matrix A and a matrix G returns a H matrix with the C++ Armadillo library.

Usage

H.mat(A, G, tau = 1, omega = 1, tolparinv=1e-6)

Arguments

A

Additive relationship matrix based on pedigree.

G

Additive relationship matrix based on marker data.

tau

As described by Martini et al. (2018).

omega

As described by Martini et al. (2018).

tolparinv

Tolerance parameter for matrix inverse used when singularities are encountered in the estimation procedure.

Details

See references

Value

H Matrix with the relationship between the individuals based on pedigree and corrected by molecular information

References

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Martini, J. W., Schrauf, M. F., Garcia-Baccino, C. A., Pimentel, E. C., Munilla, S., Rogberg-Munoz, A., ... & Simianer, H. (2018). The effect of the H-1 scaling factors tau and omega on the structure of H in the single-step procedure. Genetics Selection Evolution, 50(1), 16.

Examples

####=========================================####
####random population of 200 lines with 1000 markers
####=========================================####
M <- matrix(rep(0,200*1000),200,1000)
for (i in 1:200) {
  M[i,] <- sample(c(-1,0,0,1), size=1000, replace=TRUE)
}
rownames(M) <- 1:nrow(M)
v <- sample(1:nrow(M),100)
M2 <- M[v,]

A <- A.mat(M) # assume this is a pedigree-based matrix for the sake of example
G <- A.mat(M2)

H <- H.mat(A,G)
# colfunc <- colorRampPalette(c("steelblue4","springgreen","yellow"))
# hv <- heatmap(H[1:15,1:15], col = colfunc(100),Colv = "Rowv")

Multivariate Newton-Raphson algorithm

Description

Multivariate Newton-Raphson algorithm used behind mmes when the henderson argument is set to FALSE. Algorithm made available for users that want to avoid the user-friendly interface and provide their matrices directly.

Usage

MNR(Y, # response
    X, Gx, # fixed effects
    Z, K, # random effects
    R,  # residual effects
    Ge, GeI, # initial values and constraints
    W, isInvW, # weights matrix
    iters, tolpar, tolparinv, # other params
    ai, pev,
    verbose, retscaled,
    stepweight, emweight
    )

Arguments

Y

A matrix with rows for records and columns for traits. Expected class is a 'matrix'.

X

Design matrices for fixed effects. Expected class is a 'list' with as many design matrices as fixed effects. Matrices within the list should be of class 'matrix'

Gx

Trait multiplier matrix for fixed effects. Expected class is a 'list' with as many matrices as fixed effects. Each matrix is a square matrix with as many rows and columns as number of traits. Each matrix is of class 'matrix'

Z

Design matrices for random effects. Expected class is a 'list' with as many design matrices as random effects. Each element in the list should be a matrix of class 'dgCMatrix'

K

Covariance matrices for design matrices of random effects. Expected class is a 'list' with as many covariance matrices as random effects specified in Z. Each element in the list should be a matrix of class 'dgCMatrix'

R

Residual matrices for residual effects. Expected class is a 'list' with as many residual matrices as residual effects. Each element in the list should be a square matrix of dimensions n x n, where n is the number of records. Each matrix should be of class 'matrix'

Ge

Initial values for variance components. Expected class is a 'list' with as many variance component matrices as random effects specified in Z. Each element in the list should be a matrix of dimensions t x t, where t is the number of traits and of class 'matrix'. Initial values are any real values.

GeI

Initial constraints for variance components. Expected class is a 'list' with as many variance component constraints matrices as random effects specified in Z. Each element in the list should be a matrix of dimensions t x t, where t is the number of traits and of class 'matrix'. Values expected are:

0: not to be estimated

1: estimated and constrained to be positive (i.e. variance component)

2: estimated and unconstrained (can be negative or positive, i.e. covariance component)

3: not to be estimated but fixed (value has to be provided in the Gti argument)

Please notice that lower triangular values of these matrices have to be equal to zero since only the values in the upper triangular are estimated.

W

Design matrix for weights. A matrix of class 'matrix' for weighting the records.

isInvW

A value of class 'logical' to indicate if the W matrix provided is already an inverse or not. This aims to speed up the computations.

iters

Maximum number of iterations allowed in REML. Value is of class 'integer'.

tolpar

Convergence criteria for the change in log-likelihood. Value is of class 'numeric'.

tolparinv

Tolerance parameter for matrix inverse used when singularities are encountered in the estimation procedure. Value is of class 'numeric'.

ai

A value of class 'logical' to indicate if the Average Information algorithm should be used instead. That is faster but much less stable.

pev

A value of class 'logical' to indicate if the predicted error variance should be computed or not. If FALSE computations are speeded up for models with many effects to be estimated.

verbose

A value of class 'logical' to value to indicate if the program should return the progress of the iterative algorithm.

retscaled

A value of class 'logical' to indicate if we should avoid scaling the traits.

stepweight

Vector of class 'numeric' with length equal to niters to specify the relative weight given to the second derivative Newton update.

emweight

Vector of class 'numeric' with length equal to niters to specify the relative weight given to the EM update compared to the NR update.

Details

This is the Rcpp-coded Direct-Inversion LMM REML algorithm used behind the mmer function.

Value

If all parameters are correctly indicated the program will return a list with the following information:

Vi

the inverse of the phenotypic variance matrix V^- = (ZGZ+R)^-1

P

the projection matrix Vi - [Vi*(X*Vi*X)^-*Vi]

sigma

a list with the values of the variance-covariance components with one list element for each random effect.

sigma_scaled

a list with the values of the scaled variance-covariance components with one list element for each random effect.

sigmaSE

Hessian matrix containing the variance-covariance for the variance components. SE's can be obtained taking the square root of the diagonal values of the Hessian.

Beta

a data frame for trait BLUEs (fixed effects).

VarBeta

a variance-covariance matrix for trait BLUEs

U

a list (one element for each random effect) with a data frame for trait BLUPs.

VarU

a list (one element for each random effect) with the variance-covariance matrix for trait BLUPs.

PevU

a list (one element for each random effect) with the predicted error variance matrix for trait BLUPs.

fitted

Fitted values y.hat=XB

residuals

Residual values e = Y - XB

AIC

Akaike information criterion

BIC

Bayesian information criterion

convergence

a TRUE/FALSE statement indicating if the model converged.

monitor

The values of log-likelihood and variance-covariance components across iterations during the REML estimation.

percChange

The percent change of variance components across iterations. There should be one column less than the number of iterations. Calculated as percChange = ((x_i/x_i-1) - 1) * 100 where i is the ith iteration.

dL

The vector of first derivatives of the likelihood with respect to the ith variance-covariance component.

dL2

The matrix of second derivatives of the likelihood with respect to the i.j th variance-covariance component.

References

Mrode, R. A. (2014). Linear models for the prediction of animal breeding values. Cabi.

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Examples


data(DT_cpdata, package="enhancer")
DT <- DT_cpdata

# response matrix
y <- cbind(imputev(DT$Yield), imputev(DT$Firmness))
# fixed effect incidence matrix
X <- model.matrix(~Rowf, data=DT)
# random effect incidence matrix
Z <- Matrix::sparse.model.matrix(~id - 1, data=DT)
colnames(Z) <- gsub("id","",colnames(Z))
# covariance for random effect
GT <- GT_cpdata
A <- A.mat(GT) # additive relationship matrix
A <- A[colnames(Z),colnames(Z)] # make sure of the order
A <- A + diag(1e-4,nrow(A), nrow(A))
# residual effect incidence matrix (dimensions equal nrow(y))
R1 <- Matrix::Diagonal(n=nrow(y)) 
# weights matrix (dimensions equal nrow(y))
W <- diag(nrow(y)) 
# some other parameters
maxIter=3
stepWeight <- rep(0.9, maxIter)
stepWeight[1:2] <- c(0.5, 0.7)
emWeights <- rep(0,maxIter)
# model fit
res <- MNR( Y=y, # multi-trait response
            X=list(X), Gx=list(diag(2)), # fixed effects
            Z=list(Z), K=list(A), # random effects
            R=list(R1), # residual effects
            Ge=list( (diag(2)*.3)+.15 , diag(2)*0.75 ), # inital vc 
            GeI=list(unsm2(2),diag(2)), # vc constraints
            W=W, isInvW=TRUE, # weights for records
            iters=maxIter,
            tolpar=1e-4, tolparinv=1e-6, # tolerance
            ai=FALSE, pev=FALSE, # algorithm specifics
            verbose=TRUE, 
            retscaled=FALSE, 
            stepweight=stepWeight, # second derivatives weights
            emweight=emWeights # em update weights
)
res$sigma
res$Beta
res$U[[1]]

anova form a GLMM fitted with mmes

Description

anova method for class "mmes".

Usage

## S3 method for class 'mmes'
anova(object, object2=NULL, ...)

Arguments

object

an object of class "mmes"

object2

an object of class "mmes", if NULL the program will provide regular sum of squares results.

...

Further arguments to be passed

Value

vector of anova

Author(s)

Giovanny Covarrubias

atm covariance structure

Description

atm creates a diagonal covariance structure for specific levels of the random effect to be used with the mmes solver.

Usage

  atm(x, levs, thetaC, theta)

Arguments

x

vector of observations for the random effect.

levs

levels of the random effect to use for building the incidence matrices.

thetaC

an optional symmetric matrix for constraints in the variance-covariance components. The symmetric matrix should have as many rows and columns as the number of levels in the factor 'x'. The values in the matrix define how the variance-covariance components should be estimated:

0: component will not be estimated

1: component will be estimated and constrained to be positive

2: component will be estimated and unconstrained

3: component will be fixed to the value provided in the theta argument

theta

an optional symmetric matrix for initial values of the variance-covariance components. The symmetric matrix should have as many rows and columns as the number of levels in the factor 'x'. The values in the matrix define the initial values of the variance-covariance components that will be subject to the constraints provided in thetaC. If not provided, initial values will be calculated as:

theta* = diag(ncol(mm))*.05 + matrix(.1,ncol(mm),ncol(mm))

where mm is the incidence matrix for the factor 'x'. The values provided should be scaled by the variance of the response variable.

theta = theta*/var(y)

Value

$res: a list with the provided vector and the variance covariance structure expected.

Author(s)

Giovanny Covarrubias-Pazaran

References

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Examples

x <- as.factor(c(1:5,1:5,1:5));x
atm(x, c("1","2"))
## how to use the theta and thetaC arguments:
# data(DT_example, package="enhancer")
# DT <- DT_example
# theta <- diag(2)*2; theta # initial VCs
# thetaC <- diag(2)*3; thetaC # fixed VCs
# ans1 <- mmes(Yield~Env,
#              random= ~ vsm( atm(Env, levs=c("CA.2013", "CA.2011"),
#                             theta = theta,thetaC = thetaC),ism(Name) ),
#              rcov= ~ units, nIters = 1,
#              data=DT)
# summary(ans1)$varcomp

coef form a GLMM fitted with mmes

Description

coef method for class "mmes".

Usage

## S3 method for class 'mmes'
coef(object, ...)

Arguments

object

an object of class "mmes"

...

Further arguments to be passed

Value

vector of coef

Author(s)

Giovanny Covarrubias

Imputing a matrix using correlations

Description

corImputation imputes missing data based on the correlation that exists between row levels.

Usage

  corImputation(wide, Gu=NULL, nearest=10, roundR=FALSE)

Arguments

wide

numeric matrix with individuals in rows and time variable in columns (e.g., environments, genetic markers, etc.).

Gu

optional correlation matrix between the individuals or row levels. If NULL it will be computed as the correlation of t(wide).

nearest

integer value describing how many nearest neighbours (the ones showing the highest correlation) should be used to average and return the imputed value.

roundR

a TRUE/FALSE statement describing if the average result should be rounded or not. This may be specifically useful for categorical data in the form of numbers (e.g., -1,0,1).

Value

$res: a list with the imputed matrix and the original matrix.

Author(s)

Giovanny Covarrubias-Pazaran

References

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Examples


####################################
### imputing genotype data example
####################################
# data(DT_cpdata, package="enhancer")
# X <- GT_cpdata
# # add missing data
# v <- sample(1:length(X), 500)
# Xna <- X
# Xna[v]<- NA
# ## impute (can take some time)
# Y <- corImputation(wide=Xna, Gu=NULL, nearest=20, roundR=TRUE) 
# cm <- table(Y$imputed[v],X[v])
# ## calculate accuracy
# sum(diag(cm))/length(v)
####################################
### imputing phenotypic data example
####################################
# data(DT_h2, package="enhancer")
# X <- reshape(DT_h2[,c("Name","Env","y")], direction = "wide", idvar = "Name",
#                 timevar = "Env", v.names = "y", sep= "_")
# rownames(X) <- X$Name
# X <- as.matrix(X[,-1])
# head(X)
# # add missing data
# v <- sample(1:length(X), 50)
# Xna <- X
# Xna[v]<- NA
# ## impute
# Y <- corImputation(wide=Xna, Gu=NULL, nearest=20, roundR=TRUE)
# plot(y=Y$imputed[v],x=X[v], xlab="true",ylab="predicted")
# cor(Y$imputed[v],X[v], use = "complete.obs")

covariance between random effects

Description

covm merges the incidence matrices and covariance matrices of two random effects to fit an unstructured model between 2 different random effects to be fitted with the mmes solver.

Usage

  covm(ran1, ran2, thetaC=NULL, theta=NULL)

Arguments

ran1

the random call of the first random effect.

ran2

the random call of the first random effect.

thetaC

an optional matrix for constraints in the variance components.

theta

an optional symmetric matrix for initial values of the variance-covariance components. When providing customized values, these values should be scaled with respect to the original variance. For example, to provide an initial value of 1 to a given variance component, theta would be built as:

theta = matrix( 1 / var(response) )

The symmetric matrix should have as many rows and columns as the number of levels in the factor 'x'. The values in the matrix define the initial values of the variance-covariance components that will be subject to the constraints provided in thetaC. If not provided, initial values will be calculated as:

theta = diag(ncol(mm))*.05 + matrix(.1,ncol(mm),ncol(mm))

where mm is the incidence matrix for the factor 'x'.

Details

This implementation aims to fit models where covariance between random variables is expected to exist. For example, indirect genetic effects.

Value

$Z: a incidence matrix Z* = Z Gamma which is the original incidence matrix for the timevar multiplied by the loadings.

Author(s)

Giovanny Covarrubias-Pazaran

References

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Bijma, P. (2014). The quantitative genetics of indirect genetic effects: a selective review of modelling issues. Heredity, 112(1), 61-69.

Examples


data(DT_ige, package="enhancer")
DT <- DT_ige
covRes <- with(DT, covm( vsm(ism(focal)) , vsm(ism(neighbour)) ) )
str(covRes)
# look at DT_ige help page to see how to fit an actual model

customized covariance structure

Description

csm creates a customized covariance structure for specific levels of the random effect to be used with the mmes solver.

Usage

  csm(x, mm, thetaC, theta)

Arguments

x

vector of observations for the random effect.

mm

customized variance-covariance structure for the levels of the random effect.

thetaC

0: component will not be estimated

1: component will be estimated and constrained to be positive

2: component will be estimated and unconstrained

3: component will be fixed to the value provided in the theta argument

theta

theta = matrix( 1 / var(response) )

theta = diag(ncol(mm))*.05 + matrix(.1,ncol(mm),ncol(mm))

where mm is the incidence matrix for the factor 'x'.

Value

$res: a list with the provided vector and the variance covariance structure expected for the levels of the random effect.

Author(s)

Giovanny Covarrubias-Pazaran

References

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Examples

x <- as.factor(c(1:5,1:5,1:5));x
csm(x,matrix(1,5,5))

data frame to matrix

Description

This function takes a matrix that is in data frame format and transforms it into a matrix. Other packages that allows you to obtain an additive relationship matrix from a pedigree is the 'pedigreemm' package.

Usage

  dfToMatrix(x, row="Row",column="Column",
             value="Ainverse", returnInverse=FALSE, 
             bend=1e-6)

Arguments

x

ginv element, output from the Ainverse function.

row

name of the column in x that indicates the row in the original relationship matrix.

column

name of the column in x that indicates the column in the original relationship matrix.

value

name of the column in x that indicates the value for a given row and column in the original relationship matrix.

returnInverse

a TRUE/FALSE value indicating if the inverse of the x matrix should be computed once the data frame x is converted into a matrix.

bend

a numeric value to add to the diagonal matrix in case matrix is singular for inversion.

Value

K

pedigree transformed in a relationship matrix.

Kinv

inverse of the pedigree transformed in a relationship matrix.

Author(s)

Giovanny Covarrubias-Pazaran

References

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Examples


library(Matrix)
m <- matrix(1:9,3,3)
m <- tcrossprod(m)

mdf <- as.data.frame(as.table(m))
mdf

dfToMatrix(mdf, row = "Var1", column = "Var2", 
            value = "Freq",returnInverse=FALSE )

diagonal covariance structure

Description

dsm creates a diagonal covariance structure for the levels of the random effect to be used with the mmes solver.

Usage

  dsm(x, thetaC=NULL, theta=NULL)

Arguments

x

vector of observations for the random effect.

thetaC

0: component will not be estimated

1: component will be estimated and constrained to be positive

2: component will be estimated and unconstrained

3: component will be fixed to the value provided in the theta argument

theta

theta = matrix( 1 / var(response) )

diag(ncol(mm))*.05 + matrix(.1,ncol(mm),ncol(mm))

where mm is the incidence matrix for the factor 'x'.

Value

$res: a list with the provided vector and the variance covariance structure expected for the levels of the random effect.

Author(s)

Giovanny Covarrubias-Pazaran

References

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Examples

x <- as.factor(c(1:5,1:5,1:5));x
dsm(x)
## how to use the theta and thetaC arguments:
# data(DT_example, package="enhancer")
# DT <- DT_example
# theta <- diag(3)*2; theta # initial VCs
# thetaC <- diag(3)*3; thetaC # fixed VCs
# ans1 <- mmes(Yield~Env,
#              random= ~ vsm( dsm(Env,theta = theta,thetaC = thetaC),ism(Name) ),
#              rcov= ~ units,
#              data=DT)
# summary(ans1)$varcomp

fitted form a LMM fitted with mmes

Description

fitted method for class "mmes".

Usage

## S3 method for class 'mmes'
fitted(object, ...)

Arguments

object

an object of class "mmes"

...

Further arguments to be passed to the mmes function

Value

vector of fitted values of the form y.hat = Xb + Zu including all terms of the model.

Author(s)

Giovanny Covarrubias

Examples

# data(DT_cpdata, package="enhancer")
# DT <- DT_cpdata
# GT <- GT_cpdata
# MP <- MP_cpdata
# #### create the variance-covariance matrix
# A <- A.mat(GT) # additive relationship matrix
# #### look at the data and fit the model
# head(DT)
# mix1 <- mmes(Yield~1,
#               random=~vsm(ism(id),Gu=A)
#                       + Rowf + Colf + spl2Dc(Row,Col),
#               rcov=~units,
#               data=DT)
# 
# ff=fitted(mix1)
# 
# colfunc <- colorRampPalette(c("steelblue4","springgreen","yellow"))
# lattice::wireframe(`u:Row.fitted`~Row*Col, data=ff$dataWithFitted,  
#           aspect=c(61/87,0.4), drape=TRUE,# col.regions = colfunc,
#           light.source=c(10,0,10))
# lattice::levelplot(`u:Row.fitted`~Row*Col, data=ff$dataWithFitted, col.regions = colfunc)

fixed indication matrix

Description

fixm creates a square matrix with 3's in the diagnals and off-diagonals to quickly specify a fixed constraint in the Gtc argument of the vsm function.

Usage

  fixm(x, reps=NULL)

Arguments

x

integer specifying the number of traits to be fitted for a given random effect.

reps

integer specifying the number of times the matrix should be repeated in a list format to provide easily the constraints in complex models that use the ds(), us() or cs() structures.

Value

$res: a matrix or a list of matrices with the constraints to be provided in the Gtc argument of the vsm function.

Author(s)

Giovanny Covarrubias-Pazaran

References

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Examples

fixm(4)
fixm(4,2)

identity covariance structure

Description

ism creates an identity covariance structure for the levels of the random effect to be used with the mmes solver. Any random effect with a special covariance structure should end with an ism() structure.

Usage

  ism(x, thetaC=NULL, theta=NULL)

Arguments

x

vector of observations for the random effect.

thetaC

an optional 1 x 1 matrix for constraints in the variance-covariance components. The values in the matrix define how the variance-covariance components should be estimated:

0: component will not be estimated

1: component will be estimated and constrained to be positive (default)

2: component will be estimated and unconstrained

3: component will be fixed to the value provided in the theta argument

theta

an optional 1 x 1 matrix for initial values of the variance-covariance component. When providing customized values, these values should be scaled with respect to the original variance. For example, to provide an initial value of 1 to a given variance component, theta would be built as:

theta = matrix( 1 / var(response) )

The values in the matrix define the initial values of the variance-covariance components that will be subject to the constraints provided in thetaC. If not provided, initial values (theta) will be 0.15

Value

$res: a list with the provided vector and the variance covariance structure expected for the levels of the random effect.

Author(s)

Giovanny Covarrubias-Pazaran

References

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Examples

x <- as.factor(c(1:5,1:5,1:5));x
ism(x)

# data(DT_example, package="enhancer")
# ans1 <- mmes(Yield~Env,
#              random= ~ vsm( ism( Name ) ),
#              data=DT_example)
# summary(ans1)$varcomp

mixed model equations for r records

Description

The mmer function uses the Direct-Inversion Newton-Raphson or Average Information coded in C++ using the Armadillo library to optimize dense matrix operations common in genomic selection models. These algorithms are intended to be used for problems of the type c > r (more coefficients to estimate than records in the dataset) and/or dense matrices. For problems with sparse data, or problems of the type r > c (more records in the dataset than coefficients to estimate), the MME-based algorithm in the mmes function is faster and we recommend to shift to use that function.

Usage


mmer(fixed, random, rcov, data, weights, W, nIters=20, tolParConvLL = 1e-03, 
     tolParInv = 1e-06, init=NULL, constraints=NULL,method="NR", getPEV=TRUE,
     naMethodX="exclude", naMethodY="exclude",returnParam=FALSE, 
     dateWarning=TRUE,date.warning=TRUE,verbose=TRUE, reshapeOutput=TRUE, stepWeight=NULL,
     emWeight=NULL, contrasts=NULL)

Arguments

fixed

A formula specifying the response variable(s) and fixed effects, e.g.:

response ~ covariate for univariate models

cbind(response.i,response.j) ~ covariate for multivariate models

The fcm function can be used to constrain fixed effects in multi-response models.

random

A formula specifying the name of the random effects, e.g. random= ~ genotype + year.

Useful functions can be used to fit heterogeneous variances and other special models (see 'Special Functions' in the Details section for more information):

vsr(...,Gu,Gti,Gtc) is the main function to specify variance models and special structures for random effects. On the ... argument you provide the unknown variance-covariance structures (e.g., usr,dsr,atr,csr) and the random effect where such covariance structure will be used (the random effect of interest). Gu is used to provide known covariance matrices among the levels of the random effect, Gti initial values and Gtc for constraints. Auxiliar functions for building the variance models are:

** dsr(x), usr(x), csr(x) and atr(x,levs) can be used to specify unknown diagonal, unstructured and customized unstructured and diagonal covariance structures to be estimated by REML.

** unsm(x), fixm(x) and diag(x) can be used to build easily matrices to specify constraints in the Gtc argument of the vsr() function.

** overlay(), spl2Da(), spl2Db(), and leg() functions can be used to specify overlayed of design matrices of random effects, two dimensional spline and random regression models within the vsr() function.

gvsr(...,Gu,Guc,Gti,Gtc) is an alternative function to specify general variance structures between different random effects. An special case in the indirect genetic effect models. Is similar to the vsr function but in the ... argument the different random effects are provided.

rcov

A formula specifying the name of the error term, e.g., rcov= ~ units.

Special heterogeneous and special variance models and constraints for the residual part are the same used on the random term but the name of the random effect is always "units" which can be thought as a column with as many levels as rows in the data, e.g., rcov=~vsr(dsr(covariate),units)

data

A data frame containing the variables specified in the formulas for response, fixed, and random effects.

weights

Name of the covariate for weights. To be used for the product R = Wsi*R*Wsi, where * is the matrix product, Wsi is the square root of the inverse of W and R is the residual matrix.

W

nIters

Maximum number of iterations allowed.

tolParConvLL

Convergence criteria for the change in log-likelihood.

tolParInv

Tolerance parameter for matrix inverse used when singularities are encountered in the estimation procedure.

init

Initial values for the variance components. By default this is NULL and initial values for the variance components are provided by the algorithm, but in case the user want to provide initial values for ALL var-cov components this argument is functional. It has to be provided as a list, where each list element corresponds to one random effect (1x1 matrix) and if multitrait model is pursued each element of the list is a matrix of variance covariance components among traits for such random effect. Initial values can also be provided in the Gti argument of the vsr function. Is highly encouraged to use the Gti and Gtc arguments of the vsr function instead of this argument, but these argument can be used to provide all initial values at once

constraints

When initial values are provided these have to be accompanied by their constraints. See the vsr function for more details on the constraints. Is highly encouraged to use the Gti and Gtc arguments of the vsr function instead of this argument but these argument can be used to provide all constraints at once.

method

This refers to the method or algorithm to be used for estimating variance components. Direct-inversion Newton-Raphson NR and Average Information AI (Tunnicliffe 1989; Gilmour et al. 1995; Lee et al. 2015).

getPEV

A TRUE/FALSE value indicating if the program should return the predicted error variance and variance for random effects. This option is provided since this can take a long time for certain models where p is > n by a big extent.

naMethodX

One of the two possible values; "include" or "exclude". If "include" is selected then the function will impute the X matrices for fixed effects with the median value. If "exclude" is selected it will get rid of all rows with missing values for the X (fixed) covariates. The default is "exclude". The "include" option should be used carefully.

naMethodY

One of the three possible values; "include", "include2" or "exclude" (default) to treat the observations in response variable to be used in the estimation of variance components. The first option "include" will impute the response variables for all rows with the median value, whereas "include2" imputes the responses only for rows where there is observation(s) for at least one of the responses (only available in the multi-response models). If "exclude" is selected (default) it will get rid of rows in response(s) where missing values are present for at least one of the responses.

returnParam

A TRUE/FALSE value to indicate if the program should return the parameters to be used for fitting the model instead of fitting the model.

dateWarning

A TRUE/FALSE value to indicate if the program should warn you when is time to update the sommer package.

date.warning

A TRUE/FALSE value to indicate if the program should warn you when is time to update the sommer package. This argument will be removed soon, just left for backcompatibility.

verbose

A TRUE/FALSE value to indicate if the program should return the progress of the iterative algorithm.

reshapeOutput

A TRUE/FALSE value to indicate if the output should be reshaped to be easier to interpret for the user, some information is missing from the multivariate models for an easy interpretation.

stepWeight

emWeight

contrasts

an optional list. See the contrasts.arg of model.matrix.default.

Details

The use of this function requires a good understanding of mixed models. Please review the 'sommer.quick.start' vignette and pay attention to details like format of your random and fixed variables (e.g. character and factor variables have different properties when returning BLUEs or BLUPs, please see the 'sommer.changes.and.faqs' vignette).

For tutorials on how to perform different analysis with sommer please look at the vignettes by typing in the terminal:

vignette("v1.sommer.quick.start")

vignette("v2.sommer.changes.and.faqs")

vignette("v3.sommer.qg")

vignette("v4.sommer.gxe")

Citation

Type citation("sommer") to know how to cite the sommer package in your publications.

Special variance structures

vsr(atr(x,levels),y)

can be used to specify heterogeneous variance for the "y" covariate at specific levels of the covariate "x", e.g., random=~vsr(at(Location,c("A","B")),ID) fits a variance component for ID at levels A and B of the covariate Location.

vsr(dsr(x),y)

can be used to specify a diagonal covariance structure for the "y" covariate for all levels of the covariate "x", e.g., random=~vsr(dsr(Location),ID) fits a variance component for ID at all levels of the covariate Location.

vsr(usr(x),y)

can be used to specify an unstructured covariance structure for the "y" covariate for all levels of the covariate "x", e.g., random=~vsr(usr(Location),ID) fits variance and covariance components for ID at all levels of the covariate Location.

vsr(overlay(...,rlist=NULL,prefix=NULL))

can be used to specify overlay of design matrices between consecutive random effects specified, e.g., random=~vsr(overlay(male,female)) overlays (overlaps) the incidence matrices for the male and female random effects to obtain a single variance component for both effects. The 'rlist' argument is a list with each element being a numeric value that multiplies the incidence matrix to be overlayed. See overlay for details.Can be combined with vsr().

vsr(leg(x,n),y)

can be used to fit a random regression model using a numerical variable x that marks the trayectory for the random effect y. The leg function can be combined with the special functions dsr, usr at and csr. For example random=~vsr(leg(x,1),y) or random=~vsr(usr(leg(x,1)),y).

vsr(x,Gtc=fcm(v))

can be used to constrain fixed effects in the multi-response mixed models. This is a vector that specifies if the fixed effect is to be estimated for such trait. For example fixed=cbind(response.i, response.j)~vsr(Rowf, Gtc=fcm(c(1,0))) means that the fixed effect Rowf should only be estimated for the first response and the second should only have the intercept.

gvsr(x,y)

can be used to fit variance and covariance parameters between two or more random effects. For example, indirect genetic effect models.

spl2Da(x.coord, y.coord, at.var, at.levels))

can be used to fit a 2-dimensional spline (e.g., spatial modeling) using coordinates x.coord and y.coord (in numeric class) assuming a single variance component. The 2D spline can be fitted at specific levels using the at.var and at.levels arguments. For example random=~spl2Da(x.coord=Row,y.coord=Range,at.var=FIELD).

spl2Db(x.coord, y.coord, at.var, at.levels))

can be used to fit a 2-dimensional spline (e.g., spatial modeling) using coordinates x.coord and y.coord (in numeric class) assuming multiple variance components. The 2D spline can be fitted at specific levels using the at.var and at.levels arguments. For example random=~spl2Db(x.coord=Row,y.coord=Range,at.var=FIELD).

S3 methods

S3 methods are available for some parameter extraction such as fitted.mmer, residuals.mmer, summary.mmer, randef, coef.mmer, anova.mmer, plot.mmer, and predict.mmer to obtain adjusted means. In addition, the vpredict function (replacement of the pin function) can be used to estimate standard errors for linear combinations of variance components (e.g., ratios like h2).

Additional Functions

Additional functions for genetic analysis have been included such as relationship matrix building (A.mat, D.mat, E.mat, H.mat), build a genotypic hybrid marker matrix (build.HMM), plot of genetic maps (map.plot), and manhattan plots (manhattan). If you need to build a pedigree-based relationship matrix use the getA function from the pedigreemm package.

Bug report and contact

If you have any technical questions or suggestions please post it in https://stackoverflow.com or https://stats.stackexchange.com

If you have any bug report please go to https://github.com/covaruber/sommer or send me an email to address it asap, just make sure you have read the vignettes carefully before sending your question.

Example Datasets

The package has been equiped with several datasets to learn how to use the sommer package:

* DT_halfdiallel, DT_fulldiallel and DT_mohring datasets have examples to fit half and full diallel designs.

* DT_h2 to calculate heritability

* DT_cornhybrids and DT_technow datasets to perform genomic prediction in hybrid single crosses

* DT_wheat dataset to do genomic prediction in single crosses in species displaying only additive effects.

* DT_cpdata dataset to fit genomic prediction models within a biparental population coming from 2 highly heterozygous parents including additive, dominance and epistatic effects.

* DT_polyploid to fit genomic prediction and GWAS analysis in polyploids.

* DT_gryphon data contains an example of an animal model including pedigree information.

* DT_btdata dataset contains an animal (birds) model.

* DT_legendre simulated dataset for random regression model.

* DT_sleepstudy dataset to know how to translate lme4 models to sommer models.

* DT_ige dataset to show how to fit indirect genetic effect models.

Models Enabled

For details about the models enabled and more information about the covariance structures please check the help page of the package (sommer).

Value

If all parameters are correctly indicated the program will return a list with the following information:

Vi

the inverse of the phenotypic variance matrix V^- = (ZGZ+R)^-1

P

the projection matrix Vi - [Vi*(X*Vi*X)^-*Vi]

sigma

a list with the values of the variance-covariance components with one list element for each random effect.

sigma_scaled

a list with the values of the scaled variance-covariance components with one list element for each random effect.

sigmaSE

Hessian matrix containing the variance-covariance for the variance components. SE's can be obtained taking the square root of the diagonal values of the Hessian.

Beta

a data frame for trait BLUEs (fixed effects).

VarBeta

a variance-covariance matrix for trait BLUEs

U

a list (one element for each random effect) with a data frame for trait BLUPs.

VarU

a list (one element for each random effect) with the variance-covariance matrix for trait BLUPs.

PevU

a list (one element for each random effect) with the predicted error variance matrix for trait BLUPs.

fitted

Fitted values y.hat=XB

residuals

Residual values e = Y - XB

AIC

Akaike information criterion

BIC

Bayesian information criterion

convergence

a TRUE/FALSE statement indicating if the model converged.

monitor

The values of log-likelihood and variance-covariance components across iterations during the REML estimation.

percChange

dL

The vector of first derivatives of the likelihood with respect to the ith variance-covariance component.

dL2

The matrix of second derivatives of the likelihood with respect to the i.j th variance-covariance component.

method

The method for extimation of variance components specified by the user.

call

Formula for fixed, random and rcov used.

constraints

contraints used in the mixed models for the random effects.

constraintsF

contraints used in the mixed models for the fixed effects.

data

The dataset used in the model after removing missing records for the response variable.

dataOriginal

The original dataset used in the model.

terms

The name of terms for responses, fixed, random and residual effects in the model.

termsN

The number of effects associated to fixed, random and residual effects in the model.

sigmaVector

a vectorized version of the sigma element (variance-covariance components) to match easily the standard errors of the var-cov components stored in the element sigmaSE.

reshapeOutput

The value provided to the mmer function for the argument with the same name.