Type: | Package |
Title: | Split Knockoffs for Structural Sparsity |
Version: | 2.1 |
Date: | 2024-10-13 |
Author: | Yuxuan Chen [aut, cre] (Development of the latest version of the packages), Haoxue Wang [aut] (Development of the first version of the packages), Yang Cao [aut] (Revison of this package), Xinwei Sun [aut] (Original ideas about the package), Yuan Yao [aut] (Testing for the package and management of the development) |
Maintainer: | Yuxuan Chen <yx.chen@connect.ust.hk> |
Description: | Split Knockoff is a data adaptive variable selection framework for controlling the (directional) false discovery rate (FDR) in structural sparsity, where variable selection on linear transformation of parameters is of concern. This proposed scheme relaxes the linear subspace constraint to its neighborhood, often known as variable splitting in optimization. Simulation experiments can be reproduced following the Vignette. 'Split Knockoffs' is first defined in Cao et al. (2021) <doi:10.48550/arXiv.2103.16159>. |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
License: | MIT + file LICENSE |
Depends: | R (≥ 3.5.0) |
Imports: | glmnet, MASS, latex2exp, RSpectra, ggplot2, Matrix, stats, mvtnorm |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2024-10-14 11:16:35 UTC; 陈宇轩 |
Repository: | CRAN |
Date/Publication: | 2024-10-14 15:00:06 UTC |
singular value decomposition
Description
Computes a reduced SVD without sign ambiguity. Our convention is that the sign of each vector in U is chosen such that the coefficient with largest absolute value is positive.
Usage
canonicalSVD(X)
Arguments
X |
the input matrix |
Value
S
U
V
Examples
nu = 10
n = 350
m = 100
A_gamma <- rbind(matrix(0,n,m),-diag(m)/sqrt(nu))
svd.result = canonicalSVD(A_gamma)
S <- svd.result$S
S <- diag(S)
V <- svd.result$V
calculate the CV optimal beta
Description
cv_all calculate the CV optimal beta in the problem 1/n |y - X beta|^2 + 1/nu |D beta - gamma|^2 + lambda |gamma|_1.
Usage
cv_all(X, y, D, option)
Arguments
X |
the design matrix |
y |
the response vector |
D |
the linear transform |
option |
options for screening |
Value
beta_hat: CV optimal beta
stat_cv: various intermedia statistics
calculate the CV optimal beta and estimated support set
Description
cv_all calculate the CV optimal beta and estimated support set in the problem 1/n |y - X beta|^2 + 1/nu |D beta - gamma|^2 + lambda |gamma|_1.
Usage
cv_screen(X, y, D, option)
Arguments
X |
the design matrix |
y |
the response vector |
D |
the linear transform |
option |
options for screening |
Value
beta_hat: CV optimal beta
stat_cv: various intermedia statistics, including the estimated support sets
hitting point calculator on a given path
Description
calculate the hitting time and the sign of respective variable in a path.
Usage
hittingpoint(coef, lambdas)
Arguments
coef |
the path for one variable |
lambdas |
respective value of lambda in the path |
Value
Z: the hitting time
r: the sign of respective variable at the hitting time
default normalization function for matrix
Description
normalize columns of a matrix.
Usage
normc(X)
Arguments
X |
the input martix |
Value
Y: the output matrix
Examples
library(mvtnorm)
n = 350
p = 100
Sigma = matrix(0, p, p)
X <- rmvnorm(n,matrix(0, p, 1), Sigma)
X <- normc(X)
split knockoff selector given W statistics
Description
split knockoff selector given W statistics
Usage
select(W, q, method = "knockoff+")
Arguments
W |
statistics W_j for testing null hypothesis |
q |
target FDR |
method |
option$method can be 'knockoff' or 'knockoff+' |
Value
S: array of selected variable indices
W statistics generator based on a fixed beta(lambda) = hat beta
Description
generates the split knockoff statistics W based on a fixed beta(lambda) = hat beta in the intercepetion assignment step.
Usage
sk.W_fixed(X, D, y, nu, option)
Arguments
X |
the design matrix |
D |
the linear transform |
y |
the response vector |
nu |
the parameter for variable splitting |
option |
options for creating the Knockoff statistics option$lambda: the choice of lambda for the path option$beta_hat: the choice of beta(lambda) = hat beta |
Value
the split knockoff statistics W and various intermedia statistics
W statistics generator based on the beta(lambda) from a split LASSO path
Description
generates the split knockoff statistics W based on the beta(lambda) from a split LASSO path in the intercepetion assignment step.
Usage
sk.W_path(X, D, y, nu, option)
Arguments
X |
the design matrix |
D |
the linear transform |
y |
the response vector |
nu |
the parameter for variable splitting |
option |
options for creating the Knockoff statistics option$lambda: the choice of lambda for the path |
Value
the split knockoff statistics W and various intermedia statistics
generate split knockoff copies
Description
Gives the variable splitting design matrix and response vector. It will also create a split knockoff copy if required.
Usage
sk.create(X, y, D, nu, option)
Arguments
X |
the design matrix |
y |
the response vector |
D |
the linear transform |
nu |
the parameter for variable splitting |
option |
options for creating the Knockoff copy option$copy true : create a knockoff copy; |
Value
A_beta: the design matrix for beta after variable splitting
A_gamma: the design matrix for gamma after variable splitting
tilde_y: the response vector after variable splitting.
tilde_A_gamma: the knockoff copy of A_beta; will be NULL if option$copy = false.
Examples
option <- list()
option$q <- 0.2
option$method <- 'knockoff'
option$normalize <- 'true'
option$lambda <- 10.^seq(0, -6, by=-0.01)
option$nu <- 10
option$copy <- 'true'
library(mvtnorm)
sigma <-1
p <- 100
D <- diag(p)
m <- nrow(D)
n <- 350
nu = 10
c = 0.5
Sigma = matrix(0, p, p)
for( i in 1: p){
for(j in 1: p){
Sigma[i, j] <- c^(abs(i - j))
}
}
X <- rmvnorm(n,matrix(0, p, 1), Sigma)
beta_true <- matrix(0, p, 1)
varepsilon <- rnorm(n) * sqrt(sigma)
y <- X %*% beta_true + varepsilon
creat.result <- sk.create(X, y, D, nu, option)
A_beta <- creat.result$A_beta
A_gamma <- creat.result$A_gamma
tilde_y <- creat.result$tilde_y
tilde_A_gamma <- creat.result$tilde_A_gamma
make SVD as well as orthogonal complements
Description
make SVD as well as orthogonal complements
Usage
sk.decompose(A, D)
Arguments
A |
the input matrix |
D |
the linear transform |
Value
U
S
V
U_perp: orthogonal complement for U
Examples
library(mvtnorm)
n = 350
p = 100
D <- diag(p)
Sigma = matrix(0, p, p)
X <- rmvnorm(n,matrix(0, p, 1), Sigma)
decompose.result <- sk.decompose(X, D)
U_perp <- decompose.result$U_perp
split Knockoff filter for structural sparsity problem
Description
split Knockoff filter for structural sparsity problem
Usage
sk.filter(X, D, y, option)
Arguments
X |
the design matrix |
D |
the response vector |
y |
the linear transformation |
option |
options for creating the Split Knockoff statistics. option$q: the desired FDR control target. option$beta: choices on beta(lambda), can be: 'path', beta(lambda) is taken from a regularization path; 'cv_beta', beta(lambda) is taken as the cross validation optimal estimator hat beta; or 'cv_all', beta(lambda) as well as nu are taken from the cross validation optimal estimators hat beta and hat nu.The default setting is 'cv_all'. option$lambda_cv: a set of lambda appointed for cross validation in estimating hat beta, default 10.^seq(0, -8, by = -0.4). option$nu_cv: a set of nu appointed for cross validation in estimating hat beta and hat nu, default 10.^seq(0, 2, by = 0.4). option$nu: a set of nu used in option.beta = 'path' or 'cv_beta' for Split Knockoffs, default 10.^seq(0, 2, by = 0.2). option$lambda: a set of lambda appointed for Split LASSO path calculation, default 10.^seq(0, -6, by = -0.01). option$normalize: whether to normalize the data, default true. option$W: the W statistics used for Split Knockoffs, can be 's', 'st', 'bc', 'bct', default 'st'. |
Value
various intermedia statistics
compute the threshold for variable selection
Description
compute the threshold for variable selection
Usage
threshold(W, q, method = "knockoff+")
Arguments
W |
statistics W_j for testing null hypothesis beta_j = 0 |
q |
target FDR |
method |
option$method can be 'knockoff' or 'knockoff+' |
Value
T: threshold for variable selection