Type: | Package |
Title: | Shrinkage Covariance Incorporating Prior Knowledge |
Version: | 2.0.3 |
Description: | Implements estimation methods for shrinkage covariance matrices using user-specified covariance targets. The covariance target is a structured matrix towards which the unbiased sample covariance is shrunk, optionally incorporating prior knowledge. Shrinkage intensity is computed analytically. The method is described and applied to microarray gene expression data in Jelizarow et al. (2010) <doi:10.1093/bioinformatics/btq323>. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Imports: | stats |
URL: | https://github.com/vguillemot/SHIP |
BugReports: | https://github.com/vguillemot/SHIP/issues |
NeedsCompilation: | no |
Packaged: | 2025-03-20 12:53:13 UTC; vguillem |
Author: | Vincent Guillemot [aut, cre], Monika Jelizarow [aut] |
Maintainer: | Vincent Guillemot <vincent.guillemot@pasteur.fr> |
Repository: | CRAN |
Date/Publication: | 2025-03-22 11:50:11 UTC |
SHrinkage covariance Incorporating Prior knowledge
Description
The SHIP-package implements the shrinkage estimator of a covariance matrix given any covariance target, such as described by Schaefer and Strimmer in 2005. In addition, it proposes several targets based on biological knowledge extracted from the public database KEGG.
Details
To use the shrinkage estimator, one should just have at hand a data set in
the form of a n \times p
matrix, and a covariance target.
If one wishes to use the proposed targets, the data set should be compatible with KEGG, i.e. it should be possible to extract for each gene the pathways it belongs to. This information, for example, can be found in libraries such as hgu133plus2.db.
Author(s)
Monika Jelizarow and Vincent Guillemot
References
J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
M. Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, A.-L. Boulesteix, 2010. Over-optimism in bioinformatics: an illustration. Bioinformatics. Accepted.
See Also
Useful links:
Examples
# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
sig1 <- shrink.estim(x,targetD(x))
sig2 <- shrink.estim(x,targetF(x))
sig3 <- shrink.estim(x,targetCor(x,genegroups))
sig4 <- shrink.estim(x,targetG(x,genegroups))
paste(sig1[[2]],collapse=" ")
paste(sig2[[2]],collapse=" ")
paste(sig3[[2]],collapse=" ")
paste(sig4[[2]],collapse=" ")
Creates a covariance target, optionally by using prior information (e.g. from KEGG pathways).
Description
The function 'build.target()' is a wrapper function to build the various types of covariance targets: diagonal ("D"), constant correlation ("F"), knowledge based ("G", "Gpos", and "Gstar"), correlation ("cor").
Usage
build.target(x, genegroups = NULL, type = "D")
Arguments
x |
An |
genegroups |
List of the groups each gene belongs to: each entry of the
list is dedicated to a gene (identified the same way as in |
type |
Character string specifying the wished target: "D" (by default) for a diagonal target, "cor" for a correlation target, "G", "Gpos" and "Gstar" for a G-type target (see Jelizarow et al, 2010) and "F" for a F-target. |
Value
A p \times p
target covariance matrix of a certain
type.
Author(s)
Vincent Guillemot and Monika Jelizarow
References
M. Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, A.-L. Boulesteix, 2010. Over-optimism in bioinformatics: an illustration. Bioinformatics. Accepted.
See Also
targetCor
, targetD
,
targetF
, targetG
, targetGpos
,
targetGstar
,.
Examples
# Simulate dataset
x <- matrix(rnorm(20*30), 20, 30)
# Try different targets
build.target(x, type = "D")
Check if two genes belong to any common KEGG pathway.
Description
Takes as arguments two vectors of IDs and test whether they have a common ID.
Usage
check.path(p1, p2)
Arguments
p1 |
Vector of pathways that gene 1 belongs to. |
p2 |
Vector of pathways that gene 2 belongs to. |
Value
Return 0 if the two genes don't belong to a common pathway, return 1
otherwise. This is an internal function used by the function
target.help
.
Author(s)
Monika Jelizarow and Vincent Guillemot
See Also
Examples
g1 <- c("path1","path2","path3","path4")
g2 <- c("path5","path6","path3","path11")
g3 <- c("path10","path5","path12","path13")
check.path(g1, g2) # 1
check.path(g1, g3) # 0
Small example extracted from a microarray data set.
Description
The microarray data set is the study on the prostate cancer by Singh et al. The collection of the microarray is hgu95av2, and the gene groups are thus given by the information in the hgu95av2.db Bioconductor library (see Carslon et al.).
Usage
data("expl")
Format
The dataset is a list containing:
a
102 \times 100
matrixx
of 100 genes randomly chosen from the data set of Singh et al.,a list ‘genegroups’ containing 100 vectors of KEGG pathway IDs (which each gene belongs to).
Source
M. Carlson, S. Falcon, H. Pages, N. Li. hgu95av2.db: Affymetrix Human Genome U95 Set annotation data (chip hgu95av2). R package version 2.2.12.
D. Singh, P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A. Renshaw, A. V. D'Amico, J. P. Richie, E. S. Lander, M. Loda, P. W. Kantoff, T. R. Golub, W. R. Sellers, 2002. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, Department of Adult Oncology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA., 1, 203-209.
Shrinkage estimator of the covariance matrix, given a data set and a covariance target.
Description
The shrinkage estimator is computed independently of the target's nature.
Usage
shrink.estim(x, tar)
Arguments
x |
A |
tar |
A |
Value
A p \times p
shrinkage covariance matrix and the
estimated \lambda
.
Author(s)
Monika Jelizarow and Vincent Guillemot
References
J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
Examples
# Simulate dataset
x <- matrix(rnorm(20*30),20,30)
# Try different targets
shrink.estim(x, tar = build.target(x, type="D"))
shrink.estim(x, tar = build.target(x, type="D"))
Transform a list of Pathway IDs into a binary matrix.
Description
This function is an internal function used to transform the given list of
p
(one vector of pathway IDs per gene) groups into a binary matrix.
Usage
target.help(genes)
Arguments
genes |
List of |
Value
A p \times p
binary matrix: the coefficient (i,j) is 1
if genes i and j belong to a common pathway and 0 otherwise. This is an
internal function called by for example targetG
.
Author(s)
Monika Jelizarow and Vincent Guillemot
See Also
targetF
,targetG
,targetGpos
,
targetGstar
.
Examples
g1 <- c("path1", "path2", "path3", "path4")
g2 <- c("path5", "path6", "path3", "path11")
g3 <- c("path10", "path5", "path12", "path13")
target.help(list(g1, g2, g3))
Computation of the target Cor.
Description
The p \times p
target Cor is computed from the n \times
p
data matrix. It it a modified version of target G. In particular,
it tests the correlations (with a significance level of 0.05) and sets the
non-significant correlations to zero before the mean correlation
\bar{r}
is computed.
Usage
targetCor(x, genegroups)
Arguments
x |
A |
genegroups |
A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA. |
Value
A p \times p
matrix.
Author(s)
Monika Jelizarow and Vincent Guillemot
References
J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
See Also
targetCor
, targetF
,
targetG
, targetGstar
, targetGpos
.
Examples
# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
tar <- targetCor(x,genegroups)
which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !
Computation of the diagonal target D ('diagonal, unequal variances').
Description
The p \times p
diagonal target D is computed from the n
\times p
data matrix. It is defined as follows (i,j =
1,...,p
):
t_{ij}=\begin{cases}s_{ii} & \text{ if } i=j \\ 0 & \text{ otherwise }\end{cases}
where
s_{ij}
denotes the entry of the unbiased covariance matrix in row
i
, column j
.
Usage
targetD(x, genegroups)
Arguments
x |
A |
genegroups |
The genegroups are not used for this target. |
Value
A p \times p
diagonal matrix.
Author(s)
Monika Jelizarow and Vincent Guillemot
References
J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
See Also
targetCor
, targetF
,
targetG
, targetGstar
, targetGpos
.
Examples
x <- matrix(rnorm(10*30),10,30)
tar <- targetD(x,NULL)
Computation of target F ('constant correlation model').
Description
The p \times p
target F is computed from the n \times p
data matrix. It is defined as follows (i,j = 1,...,p
):
t_{ij} = \begin{cases}
s_{ii} & \text{ if } i=j \\
\bar{r}\sqrt{s_{ii}s_{jj} \text{ otherwise }}&
\end{cases}
where \bar{r}
is the average of
sample correlations and s_{ij}
denotes the entry of the unbiased
covariance matrix in row i
, column j
.
Usage
targetF(x, genegroups)
Arguments
x |
A |
genegroups |
The genegroups are not used for this target. |
Value
A p \times p
matrix.
Author(s)
Monika Jelizarow and Vincent Guillemot
References
J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
See Also
targetCor
, targetF
,
targetG
, targetGstar
, targetGpos
.
Examples
# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
tar <- targetF(x,NULL)
which(tar[upper.tri(tar)]!=0) # many non zero coefficients !
Computation of target G ('knowledge-based constant correlation model').
Description
The p \times p
target G is computed from the n \times p
data matrix. It is defined as follows (i,j = 1,...,p
):
t_{ij} =
\begin{cases}
s_{ii} & \text{ if } i=j\\
\bar{r}\sqrt{s_{ii}s_{jj}} & \text{ if } i\neq j, i\sim j
\end{cases}
where \bar{r}
is the average of sample correlations and s_{ij}
denotes the
entry of the unbiased covariance matrix in row i
, column
j
. The notation i\sim j
means that genes i
and j
are connected, i.e. genes i
and j
are in
the same gene functional group.
Usage
targetG(x, genegroups)
Arguments
x |
A |
genegroups |
A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA. |
Value
A p \times p
matrix.
Author(s)
Monika Jelizarow and Vincent Guillemot
References
J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
M. Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, A.-L. Boulesteix, 2010. Over-optimism in bioinformatics: an illustration. Bioinformatics. Accepted.
See Also
targetCor
, targetF
,
targetG
, targetGstar
, targetGpos
.
Examples
# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
tar <- targetG(x,genegroups)
which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !
Computation of the target Gpos.
Description
The p \times p
target Gpos is computed from the n \times
p
data matrix. It it a modified version of target G. In particular,
it completely ignores negative correlations and computes the mean
correlation \bar{r}
using the positive ones only.
Usage
targetGpos(x, genegroups)
Arguments
x |
A |
genegroups |
A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA. |
Value
A p \times p
matrix.
Author(s)
Monika Jelizarow and Vincent Guillemot
References
J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
M. Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, A.-L. Boulesteix, 2010. Over-optimism in bioinformatics: an illustration. Bioinformatics. Accepted.
See Also
targetCor
, targetF
,
targetG
, targetGstar
, targetGpos
.
Examples
# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
tar <- targetGpos(x,genegroups)
which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !
Computation of the target Gstar.
Description
The p \times p
target Gstar is computed from the n \times
p
data matrix. It it a modified version of target G. In particular,
it involves two parameters for the correlation (a positive and a negative
one) instead of the single parameter \bar{r}
in order to account
for negatively correlated genes within the same pathway
Usage
targetGstar(x, genegroups)
Arguments
x |
A |
genegroups |
A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA. |
Value
A p \times p
matrix.
Author(s)
Monika Jelizarow and Vincent Guillemot
References
J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
M. Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, A.-L. Boulesteix, 2010. Over-optimism in bioinformatics: an illustration. Bioinformatics. Accepted.
See Also
targetCor
, targetF
,
targetG
, targetGstar
, targetGpos
.
Examples
# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
tar <- targetGstar(x,genegroups)
which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !