Type: Package
Title: Shrinkage Covariance Incorporating Prior Knowledge
Version: 2.0.3
Description: Implements estimation methods for shrinkage covariance matrices using user-specified covariance targets. The covariance target is a structured matrix towards which the unbiased sample covariance is shrunk, optionally incorporating prior knowledge. Shrinkage intensity is computed analytically. The method is described and applied to microarray gene expression data in Jelizarow et al. (2010) <doi:10.1093/bioinformatics/btq323>.
License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
Encoding: UTF-8
RoxygenNote: 7.3.2
Imports: stats
URL: https://github.com/vguillemot/SHIP
BugReports: https://github.com/vguillemot/SHIP/issues
NeedsCompilation: no
Packaged: 2025-03-20 12:53:13 UTC; vguillem
Author: Vincent Guillemot [aut, cre], Monika Jelizarow [aut]
Maintainer: Vincent Guillemot <vincent.guillemot@pasteur.fr>
Repository: CRAN
Date/Publication: 2025-03-22 11:50:11 UTC

SHrinkage covariance Incorporating Prior knowledge

Description

The SHIP-package implements the shrinkage estimator of a covariance matrix given any covariance target, such as described by Schaefer and Strimmer in 2005. In addition, it proposes several targets based on biological knowledge extracted from the public database KEGG.

Details

To use the shrinkage estimator, one should just have at hand a data set in the form of a n \times p matrix, and a covariance target.

If one wishes to use the proposed targets, the data set should be compatible with KEGG, i.e. it should be possible to extract for each gene the pathways it belongs to. This information, for example, can be found in libraries such as hgu133plus2.db.

Author(s)

Monika Jelizarow and Vincent Guillemot

References

See Also

Useful links:

Examples


# A short example on a toy dataset
# require(SHIP)

data(expl)
attach(expl)

sig1 <- shrink.estim(x,targetD(x))
sig2 <- shrink.estim(x,targetF(x))
sig3 <- shrink.estim(x,targetCor(x,genegroups))
sig4 <- shrink.estim(x,targetG(x,genegroups))

paste(sig1[[2]],collapse=" ")
paste(sig2[[2]],collapse=" ")
paste(sig3[[2]],collapse=" ")
paste(sig4[[2]],collapse=" ")


Creates a covariance target, optionally by using prior information (e.g. from KEGG pathways).

Description

The function 'build.target()' is a wrapper function to build the various types of covariance targets: diagonal ("D"), constant correlation ("F"), knowledge based ("G", "Gpos", and "Gstar"), correlation ("cor").

Usage

build.target(x, genegroups = NULL, type = "D")

Arguments

x

An n \times p matrix.

genegroups

List of the groups each gene belongs to: each entry of the list is dedicated to a gene (identified the same way as in x). Each item of the list is thus a vector of pathway IDs. Default value = 'NULL'.

type

Character string specifying the wished target: "D" (by default) for a diagonal target, "cor" for a correlation target, "G", "Gpos" and "Gstar" for a G-type target (see Jelizarow et al, 2010) and "F" for a F-target.

Value

A p \times p target covariance matrix of a certain type.

Author(s)

Vincent Guillemot and Monika Jelizarow

References

M. Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, A.-L. Boulesteix, 2010. Over-optimism in bioinformatics: an illustration. Bioinformatics. Accepted.

See Also

targetCor, targetD, targetF, targetG, targetGpos, targetGstar,.

Examples


# Simulate dataset
x <- matrix(rnorm(20*30), 20, 30)
# Try different targets
build.target(x, type = "D")


Check if two genes belong to any common KEGG pathway.

Description

Takes as arguments two vectors of IDs and test whether they have a common ID.

Usage

check.path(p1, p2)

Arguments

p1

Vector of pathways that gene 1 belongs to.

p2

Vector of pathways that gene 2 belongs to.

Value

Return 0 if the two genes don't belong to a common pathway, return 1 otherwise. This is an internal function used by the function target.help.

Author(s)

Monika Jelizarow and Vincent Guillemot

See Also

target.help

Examples


g1 <- c("path1","path2","path3","path4")
g2 <- c("path5","path6","path3","path11")
g3 <- c("path10","path5","path12","path13")
check.path(g1, g2) # 1
check.path(g1, g3) # 0

Small example extracted from a microarray data set.

Description

The microarray data set is the study on the prostate cancer by Singh et al. The collection of the microarray is hgu95av2, and the gene groups are thus given by the information in the hgu95av2.db Bioconductor library (see Carslon et al.).

Usage

data("expl")

Format

The dataset is a list containing:

Source


Shrinkage estimator of the covariance matrix, given a data set and a covariance target.

Description

The shrinkage estimator is computed independently of the target's nature.

Usage

shrink.estim(x, tar)

Arguments

x

A n \times p matrix (the data set) .

tar

A p \times p matrix (the covariance target).

Value

A p \times p shrinkage covariance matrix and the estimated \lambda.

Author(s)

Monika Jelizarow and Vincent Guillemot

References

J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.

Examples


# Simulate dataset
x <- matrix(rnorm(20*30),20,30)
# Try different targets
shrink.estim(x, tar = build.target(x, type="D"))
shrink.estim(x, tar = build.target(x, type="D"))


Transform a list of Pathway IDs into a binary matrix.

Description

This function is an internal function used to transform the given list of p (one vector of pathway IDs per gene) groups into a binary matrix.

Usage

target.help(genes)

Arguments

genes

List of p items. Each item is the vector of Pathway IDs a gene belongs to.

Value

A p \times p binary matrix: the coefficient (i,j) is 1 if genes i and j belong to a common pathway and 0 otherwise. This is an internal function called by for example targetG.

Author(s)

Monika Jelizarow and Vincent Guillemot

See Also

targetF,targetG,targetGpos, targetGstar.

Examples


g1 <- c("path1", "path2", "path3", "path4")
g2 <- c("path5", "path6", "path3", "path11")
g3 <- c("path10", "path5", "path12", "path13")
target.help(list(g1, g2, g3)) 



Computation of the target Cor.

Description

The p \times p target Cor is computed from the n \times p data matrix. It it a modified version of target G. In particular, it tests the correlations (with a significance level of 0.05) and sets the non-significant correlations to zero before the mean correlation \bar{r} is computed.

Usage

targetCor(x, genegroups)

Arguments

x

A n \times p data matrix.

genegroups

A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA.

Value

A p \times p matrix.

Author(s)

Monika Jelizarow and Vincent Guillemot

References

J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.

See Also

targetCor, targetF, targetG, targetGstar, targetGpos.

Examples


# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
tar <- targetCor(x,genegroups)
which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !

Computation of the diagonal target D ('diagonal, unequal variances').

Description

The p \times p diagonal target D is computed from the n \times p data matrix. It is defined as follows (i,j = 1,...,p):

t_{ij}=\begin{cases}s_{ii} & \text{ if } i=j \\ 0 & \text{ otherwise }\end{cases}

where s_{ij} denotes the entry of the unbiased covariance matrix in row i, column j.

Usage

targetD(x, genegroups)

Arguments

x

A n \times p data matrix.

genegroups

The genegroups are not used for this target.

Value

A p \times p diagonal matrix.

Author(s)

Monika Jelizarow and Vincent Guillemot

References

J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.

See Also

targetCor, targetF, targetG, targetGstar, targetGpos.

Examples


x <- matrix(rnorm(10*30),10,30)
tar <- targetD(x,NULL)


Computation of target F ('constant correlation model').

Description

The p \times p target F is computed from the n \times p data matrix. It is defined as follows (i,j = 1,...,p):

t_{ij} = \begin{cases} s_{ii} & \text{ if } i=j \\ \bar{r}\sqrt{s_{ii}s_{jj} \text{ otherwise }}& \end{cases}

where \bar{r} is the average of sample correlations and s_{ij} denotes the entry of the unbiased covariance matrix in row i, column j.

Usage

targetF(x, genegroups)

Arguments

x

A n \times p data matrix.

genegroups

The genegroups are not used for this target.

Value

A p \times p matrix.

Author(s)

Monika Jelizarow and Vincent Guillemot

References

J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.

See Also

targetCor, targetF, targetG, targetGstar, targetGpos.

Examples


# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
tar <- targetF(x,NULL)
which(tar[upper.tri(tar)]!=0) # many non zero coefficients !


Computation of target G ('knowledge-based constant correlation model').

Description

The p \times p target G is computed from the n \times p data matrix. It is defined as follows (i,j = 1,...,p):

t_{ij} = \begin{cases} s_{ii} & \text{ if } i=j\\ \bar{r}\sqrt{s_{ii}s_{jj}} & \text{ if } i\neq j, i\sim j \end{cases}

where \bar{r} is the average of sample correlations and s_{ij} denotes the entry of the unbiased covariance matrix in row i, column j. The notation i\sim j means that genes i and j are connected, i.e. genes i and j are in the same gene functional group.

Usage

targetG(x, genegroups)

Arguments

x

A n \times p data matrix.

genegroups

A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA.

Value

A p \times p matrix.

Author(s)

Monika Jelizarow and Vincent Guillemot

References

See Also

targetCor, targetF, targetG, targetGstar, targetGpos.

Examples


# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
tar <- targetG(x,genegroups)
which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !


Computation of the target Gpos.

Description

The p \times p target Gpos is computed from the n \times p data matrix. It it a modified version of target G. In particular, it completely ignores negative correlations and computes the mean correlation \bar{r} using the positive ones only.

Usage

targetGpos(x, genegroups)

Arguments

x

A n \times p data matrix.

genegroups

A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA.

Value

A p \times p matrix.

Author(s)

Monika Jelizarow and Vincent Guillemot

References

See Also

targetCor, targetF, targetG, targetGstar, targetGpos.

Examples


# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
tar <- targetGpos(x,genegroups)
which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !


Computation of the target Gstar.

Description

The p \times p target Gstar is computed from the n \times p data matrix. It it a modified version of target G. In particular, it involves two parameters for the correlation (a positive and a negative one) instead of the single parameter \bar{r} in order to account for negatively correlated genes within the same pathway

Usage

targetGstar(x, genegroups)

Arguments

x

A n \times p data matrix.

genegroups

A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA.

Value

A p \times p matrix.

Author(s)

Monika Jelizarow and Vincent Guillemot

References

See Also

targetCor, targetF, targetG, targetGstar, targetGpos.

Examples


# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
tar <- targetGstar(x,genegroups)
which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !