Type: Package
Title: Semi-Parametric Estimation with Gaussian Copula
Version: 1.0.0
Description: A method for estimating the correlation matrix of the Gaussian copula from the observed data. This package also contains a penalized estimation of the corresponding precision matrix, and enables to generate random vectors that are distributed according to a Gaussian copula.
Imports: mvtnorm, stats, igraph, matrixcalc, graphics, foreach, stringr, doSNOW, utils, huge
License: GPL (≥ 3)
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.2
Depends: R (≥ 2.10)
Suggests: knitr, rmarkdown, kableExtra, dplyr
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2025-06-06 09:08:54 UTC; etomilina
Author: Julie Cartier [aut], Florence Jaffrezic [aut], Gildas Mazo [aut], Ekaterina Tomilina [aut, cre]
Maintainer: Ekaterina Tomilina <ekaterina.tomilina@inrae.fr>
Repository: CRAN
Date/Publication: 2025-06-06 09:20:02 UTC

CopulaSim

Description

This function enables the user to simulate data from a Gaussian copula and arbitrary marginal quantile functions

Usage

CopulaSim(n, R, qdist, random = FALSE)

Arguments

n

the number of observations

R

a correlation matrix of size dxd

qdist

a vector containing the names of the marginal quantile functions as well as the number of times they are present in the dataset

random

a boolean defining whether the order of the correlation coefficients should be randomized

Value

a list containing an nxd data frame, the shuffled correlation matrix R, and the permutation leading to the new correlation matrix

Examples

M <- diag_block_matrix(c(3,4,5),c(0.7,0.8,0.2))
CopulaSim(20,M,c(rep("qnorm(0,1)",6),rep("qexp(0.5)",4),rep("qbinom(4,0.8)",2)),random=TRUE)


cor_network_graph

Description

This function enables the user to plot the graph corresponding to the correlations of the Gaussian copula

Usage

cor_network_graph(R, TS, binary = TRUE, legend)

Arguments

R

a correlation matrix of size dxd (d is the number of variables)

TS

a threshold for the absolute values of the correlation matrix coefficients

binary

a boolean specifying whether the coefficients should be binarized, TRUE by defaut (zero if the coefficient is less than the threshold in absolute value, 1 otherwise). If FALSE, the edge width is proportional to the coefficient value.

legend

a vector containing the type of each variable used to color the vertices

Value

a graph representing the correlations between the latent Gaussian variables

Examples

R <- diag_block_matrix(c(3,4,5),c(0.7,0.8,0.2))
data <- CopulaSim(20,R,c(rep("qnorm(0,1)",6),rep("qexp(0.5)",4),
rep("qbinom(4,0.8)",2)),random=FALSE)[[1]]
cor_network_graph(R,TS=0.3,binary=TRUE,legend=c(rep("Normal",6),
rep("Exponential",4),rep("Binomial",2)))


diag_block_matrix

Description

This function enables the user to generate a diagonal block-matrix with homogeneous blocks

Usage

diag_block_matrix(blocks, coeff)

Arguments

blocks

a vector containing the sizes of the blocks

coeff

a vector containing the coefficient corresponding to each block, the coefficients must be between 0 and 1

Value

a diagonal block-matrix containing the specified coefficients

Examples

diag_block_matrix(c(3,4,5),c(0.3,0.4,0.8))


gauss_gen

Description

This function enables the user to generate gaussian vectors with correlation matrix R

Usage

gauss_gen(R, n)

Arguments

R

a correlation matrix of size dxd

n

the number of observations

Value

a nxd data frame containing n observations of the d variables

Examples

M <- diag_block_matrix(c(3,4,5),c(0.7,0.8,0.2))
gauss_gen(M,20)


ICGC dataset

Description

Dataset containing RNA counts, protein expression and mutations measured on breast cancer tumors.

Usage

icgc_data

Format

A dataframe of 15 variables and 250 observations containing the following:

ACACA, AKT1S1, ANLN,ANXA1,AR

RNA counts (discrete)

ACACA_P, AKT1S1_P, ANLN_P,ANXA_P,AR_P

protein expression measurements (discrete)

MU5219,MU4468,MU7870,MU4842,MU6962

5 mutations (binary)


matrix_cor_ts

Description

This function enables the user to threshold matrix coefficients

Usage

matrix_cor_ts(R, TS, binary = TRUE)

Arguments

R

a correlation matrix

TS

a threshold

binary

a boolean specifying whether the coefficients should be binarized, TRUE by defaut (zero if the coefficient is less than the threshold in absolute value, 1 otherwise)

Value

the thresholded input matrix

Examples

M <- diag_block_matrix(c(3,4,5),c(0.7,0.8,0.2))
matrix_cor_ts(M,0.5)


matrix_gen

Description

This function enables the user to generate a sparse, nonnegative definite correlation matrix via the Cholesky decomposition

Usage

matrix_gen(d, gamma)

Arguments

d

the number of variables

gamma

an initial sparsity parameter for the lower triangular matrices in the Cholesky decomposition, must be between 0 and 1

Value

a list containing the generated correlation matrix and its final sparsity parameter (ie the proportion of zeros)

Examples

matrix_gen(15,0.81)


omega_estim

Description

This function enables the user estimate the precision matrix of the latent variables via gLasso inversion

Usage

omega_estim(data, Type, lambda, n)

Arguments

data

a dataset of size nxd or a correlation matrix R of size dxd

Type

a vector containing the type of the variables, "C" for continuous and "D" for discrete (in the case a data set is entered as the first parameter)

lambda

a grid of penalization parameters to be evaluated

n

the sample size used (in the case of a correlation matrix entered as the first parameter)

Value

a list containing the correlation matrix, the optimal precision matrix, the optimal lambda, the minimal HBIC, all values of lambda, all corresponding HBIC values

Examples

M <- diag_block_matrix(c(3,4,5),c(0.7,0.8,0.2))
data <- CopulaSim(20,M,c(rep("qnorm(0,1)",6),rep("qexp(0.5)",4),
rep("qbinom(4,0.8)",2)),random=FALSE)[[1]]
## Not run: P <- omega_estim(data,c(rep("C",10),rep("D",2)),seq(0.01,1,0.05))


rho_estim

Description

This function enables the user to estimate the correlation matrix of the Gaussian copula for a given dataset

Usage

rho_estim(data, Type, ncores = 1)

Arguments

data

an nxd data frame containing n observations of d variables

Type

a vector containing the type of the variables, "C" for continuous and "D" for discrete

ncores

an integer specifying the number of cores to be used for parallel computation. "1" by default, leading to non-parallel computation.

Value

the dxd estimated correlation matrix of the Gaussian copula

Examples

M <- diag_block_matrix(c(3,4,5),c(0.7,0.8,0.2))
data <- CopulaSim(20,M,c(rep("qnorm(0,1)",6),rep("qexp(0.5)",4),
rep("qbinom(4,0.8)",2)),random=FALSE)[[1]]
rho_estim(data,c(rep("C",10),rep("D",2)))