Type: | Package |
Title: | Empirical Bayesian Elastic Net |
Version: | 5.2 |
Date: | 2024-10-27 |
Maintainer: | Anhui Huang <anhuihuang@gmail.com> |
Description: | Provides the Empirical Bayesian Elastic Net for handling multicollinearity in generalized linear regression models. As a special case of the 'EBglmnet' package (also available on CRAN), this package encourages a grouping effects to select relevant variables and estimate the corresponding non-zero effects. |
License: | GPL-2 | GPL-3 [expanded from: GPL] |
Packaged: | 2024-10-27 16:46:39 UTC; anhui |
Depends: | R (≥ 2.10) |
NeedsCompilation: | yes |
Repository: | CRAN |
Date/Publication: | 2024-10-27 17:00:16 UTC |
Author: | Anhui Huang [aut, cre] |
Empirical Bayesian Elastic Net (EBEN)
Description
Fast EBEN algorithms.
EBEN implements a normal and generalized gamma hierearchical priors.
( ** ) Two parameters (alpha, lambda) are equivalent with elastic net priors.
( ** ) When parameter alpha = 1, it is equivalent with EBlasso-NE (normal + exponential)
Two models are available for both methods:
( ** ) General linear regression model.
( ** ) Logistic regression model.
Multi-collinearity:
( ** ) for group of high correlated or collinear variables: EBEN identifies the group of variables estimates their effects together.
( ** ) group of variables can be selected together.
*Epistasis (two-way interactions) can be included for all models/priors
*model implemented with memory efficient c code.
*LAPACK/BLAS are used for most linear algebra computations.
Details
Package: | EBEN |
Type: | Package |
Version: | 5.2 |
Date: | 2015-10-06 |
License: | gpl |
Author(s)
Anhui Huang
References
key algorithms:
Cai, X., Huang, A., and Xu, S. (2011). Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping. BMC Bioinformatics 12, 211.
Huang A, Xu S, Cai X. (2013). Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping. BMC genetics 14(1):5.
Huang, A., Xu, S., and Cai, X. (2014). Empirical Bayesian elastic net for multiple quantitative trait locus mapping. Heredity 10.1038/hdy.2014.79
Other publications:
Huang, A., E. Martin, et al. (2014). "Detecting genetic interactions in pathway-based genome-wide association studies." Genet Epidemiol 38(4): 300-309.
Huang, A., S. Xu, et al. (2014). "Whole-genome quantitative trait locus mapping reveals major role of epistasis on yield of rice." PLoS ONE 9(1): e87330.
Huang, A. (2014). "Sparse model learning for inferring genotype and phenotype associations." Ph.D Dissertation. University of Miami(1186).
An Example Data File for the Gauss Model
Description
This is a 1000x481 sample feature matrix
Usage
data(BASIS)
Format
The format is: int [1:1000, 1:481] 0 -1 0 0 1 0 1 0 1 0 ...
Details
The data was simulated on a 2400cM chromosome, each column corresponded to an even spaced QTL
Source
Huang, A., Xu, S., and Cai, X. (2014). Empirical Bayesian elastic net for multiple quantitative trait locus mapping. Heredity 10.1038/hdy.2014.79
Examples
data(BASIS)
An Example Data File for the Binomial Model
Description
This is a 500x481 sample feature matrix
Usage
data(BASISbinomial)
Format
The format is: int [1:500, 1:481] 0 -1 0 0 0 0 -1 -1 0 1 ...
Details
The data was simulated on a 2400cM chromosome, each column corresponded to an even spaced QTL
Source
Huang A, Xu S, Cai X: Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping. BMC genetics 2013, 14(1):5.
Examples
data(BASISbinomial)
Internal EBEN functions
Description
Internal EBEN functions
Usage
ijIndex(trueLoc,K)
CVonePair(X,y,nFolds,foldId,hyperpara,Epis,prior,family,verbose,group)
lambdaMax(X,y,Epis)
Details
These are not intended for use by users.
ijIndex
Function for looking at the pair of interaction terms.
CVonePair
Function performs nFolds CV for the given one pair of hyperparameter.
lambdaMax
Function calculate the maximum lambda for EBlasso-NE and EBEN in CV.
Author(s)
Anhui Huang and Dianting Liu
Dept of Electrical and Computer Engineering, Univ of Miami, Coral Gables, FL
The EB Elastic Net Algorithm for Binomial Model with Normal-Gamma(NG) Prior Distribution
Description
Generalized linear regression, normal-Gxponential (NG) hierarchical prior for regression coefficients
Usage
EBelasticNet.Binomial(BASIS, Target, lambda, alpha,Epis = FALSE,verbose = 0)
Arguments
BASIS |
sample matrix; rows correspond to samples, columns correspond to features |
Target |
Class label of each individual, TAKES VALUES OF 0 OR 1 |
lambda |
Hyperparameter controls degree of shrinkage; can be obtained via Cross Validation; lambda>0 |
alpha |
Hyperparameter controls degree of shrinkage; can be obtained via Cross Validation; 0<alpha<1 |
Epis |
TRUE or FALSE for including two-way interactions |
verbose |
0 or 1; 1: display message; 0 no message |
Details
If Epis=TRUE, the program adds two-way interaction of K*(K-1)/2 more columns to BASIS
Value
weight |
the none-zero regression coefficients: |
logLikelihood |
log likelihood from the final regression coefficients |
WaldScore |
Wald Score |
Intercept |
Intercept |
lambda |
the hyperparameter; same as input lambda |
alpha |
the hyperparameter; same as input alpha |
Author(s)
Anhui Huang; Dept of Electrical and Computer Engineering, Univ of Miami, Coral Gables, FL
References
Huang A, Xu S, Cai X: Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping. BMC genetics 2013, 14(1):5.
Examples
library(EBEN)
data(BASISbinomial)
data(yBinomial)
#reduce sample size to speed up the running time
n = 50;
k = 100;
N = length(yBinomial);
set = sample(N,n);
BASIS = BASISbinomial[set,1:k];
y = yBinomial[set];
output = EBelasticNet.Binomial(BASIS, y,lambda = 0.1,alpha = 0.5, Epis = FALSE,verbose = 5)
Cross Validation (CV) Function to Determine Hyperparameter of the EB_Elastic Net Algorithm for Binomial Model with Normal-Gamma (NG) Prior Distribution
Description
Hyperparameter controls degree of shrinkage, and is obtained via Cross Validation (CV). This program calculates the maximum lambda that allows one non-zero basis; and performs a search down to 0.001*lambda_max at even steps. (20 steps)
Usage
EBelasticNet.BinomialCV(BASIS, Target, nFolds,foldId, Epis = FALSE, verbose = 0)
Arguments
BASIS |
sample matrix; rows correspond to samples, columns correspond to features |
Target |
Class label of each individual, TAKES VALUES OF 0 OR 1 |
nFolds |
number of n-fold cv |
Epis |
TRUE or FALSE for including two-way interactions |
foldId |
random assign samples to different folds |
verbose |
from 0 to 5; larger verbose displays more messages |
Details
If Epis=TRUE, the program adds two-way interaction K*(K-1)/2 more columns to BASIS
Value
CrossValidation |
col1: hyperparameter; col2: loglikelihood mean; standard ERROR of nfold mean log likelihood |
Lmabda_optimal |
the optimal hyperparameter as computed |
Alpha_optimal |
the optimal hyperparameter as computed |
Author(s)
Anhui Huang; Dept of Electrical and Computer Engineering, Univ of Miami, Coral Gables, FL
References
Huang A, Xu S, Cai X: Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping. BMC genetics 2013, 14(1):5.
Examples
## not run
library(EBEN)
data(BASISbinomial)
data(yBinomial)
#reduce sample size to speed up the running time
n = 50;
k = 100;
N = length(yBinomial);
set.seed(1)
set = sample(N,n);
BASIS = BASISbinomial[set,1:k];
y = yBinomial[set];
nFolds = 3
## Not run:
CV = EBelasticNet.BinomialCV(BASIS, y, nFolds = 3,Epis = FALSE)
## End(Not run)
The EB Elastic Net Algorithm for Gaussian Model
Description
General linear regression, normal-Gamma (NG) hierarchical prior for regression coefficients
Usage
EBelasticNet.Gaussian(BASIS, Target, lambda, alpha,Epis = FALSE,verbose = 0)
Arguments
BASIS |
sample matrix; rows correspond to samples, columns correspond to features |
Target |
Response each individual |
lambda |
Hyperparameter controls degree of shrinkage; can be obtained via Cross Validation; lambda>0 |
alpha |
Hyperparameter controls degree of shrinkage; can be obtained via Cross Validation; 0<alpha<1 |
Epis |
TRUE or FALSE for including two-way interactions |
verbose |
0 or 1; 1: display message; 0 no message |
Details
If Epis=TRUE, the program adds two-way interaction of K*(K-1)/2 more columns to BASIS
Value
weight |
the none-zero regression coefficients: |
WaldScore |
Wald Score |
Intercept |
Intercept |
lambda |
the hyperparameter; same as input lambda |
alpha |
the hyperparameter; same as input alpha |
Author(s)
Anhui Huang; Dept of Electrical and Computer Engineering, Univ of Miami, Coral Gables, FL
References
Huang, A., Xu, S., and Cai, X. (2014). Empirical Bayesian elastic net for multiple quantitative trait locus mapping. Heredity 10.1038/hdy.2014.79
Examples
library(EBEN)
data(BASIS)
data(y)
n = 50;
k = 100;
BASIS = BASIS[1:n,1:k];
y = y[1:n];
Blup = EBelasticNet.Gaussian(BASIS, y,lambda = 0.0072,alpha = 0.95, Epis = FALSE,verbose = 0)
betas = Blup$weight
betas
Cross Validation (CV) Function to Determine Hyperparameters of the EBEN Algorithm for Gaussian Model
Description
Hyperparameter controls degree of shrinkage, and is obtained via Cross Validation (CV). This program calculates the maximum lambda that allows one non-zero basis; and performs a search down to 0.0001*lambda_max at even steps. (20 steps)
Usage
EBelasticNet.GaussianCV(BASIS, Target, nFolds,foldId, Epis = FALSE, verbose = 0)
Arguments
BASIS |
sample matrix; rows correspond to samples, columns correspond to features |
Target |
Response each individual |
nFolds |
number of n-fold cv |
Epis |
TRUE or FALSE for including two-way interactions |
foldId |
random assign samples to different folds |
verbose |
from 0 to 5; larger verbose displays more messages |
Details
If Epis=TRUE, the program adds two-way interaction K*(K-1)/2 more columns to BASIS
Value
CrossValidation |
col1: hyperparameter; col2: loglikelihood mean; standard ERROR of nfold mean log likelihood |
Lmabda_optimal |
the optimal hyperparameter as computed |
Alpha_optimal |
the optimal hyperparameter as computed |
Author(s)
Anhui Huang; Dept of Electrical and Computer Engineering, Univ of Miami, Coral Gables, FL
References
Huang, A., Xu, S., and Cai, X. (2013). Empirical Bayesian elastic net for multiple quantitative trait locus mapping. submitted.
Examples
library(EBEN)
data(BASIS)
data(y)
#reduce sample size to speed up the running time
n = 50;
k = 100;
BASIS = BASIS[1:n,1:k];
y = y[1:n];
## Not run:
CV = EBelasticNet.GaussianCV(BASIS, y, nFolds = 3,Epis = FALSE)
## End(Not run)
The EBlasso Algorithm for Binomial Model with Normal-Exponential-Gamma (NEG) Prior Distribution
Description
Generalized linear regression, normal-exponential-gamma (NEG) hierarchical prior for regression coefficients
Usage
EBlassoNEG.Binomial(BASIS, Target, a_gamma, b_gamma, Epis,verbose,group)
Arguments
BASIS |
sample matrix; rows correspond to samples, columns correspond to features |
Target |
Class label of each individual, TAKES VALUES OF 0 OR 1 |
a_gamma |
Hyperparameters control degree of shrinkage; can be obtained via Cross Validation; a_gamma>=-1 |
b_gamma |
Hyperparameters control degree of shrinkage; can be obtained via Cross Validation; b_gamma>0 |
Epis |
TRUE or FALSE for including two-way interactions |
verbose |
0 or 1; 1: display message; 0 no message |
group |
0 or 1; 0: No group effect; 1 two-way interaction grouped. Only valid when Epis = TRUE |
Details
If Epis=TRUE, the program adds two-way interaction K*(K-1)/2 more columns to BASIS
Value
weight |
the none-zero regression coefficients: |
logLikelihood |
log likelihood with the final regression coefficients |
WaldScore |
Wald Score |
Intercept |
Intercept |
a_gamma |
the hyperparameter; same as input |
b_gamma |
the hyperparameter; same as input |
Author(s)
Anhui Huang; Dept of Electrical and Computer Engineering, Univ of Miami, Coral Gables, FL
References
Huang, A., Xu, S., and Cai, X.(2012). Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping. BMC Genetics. Submitted
Examples
library(EBEN)
data(BASISbinomial)
data(yBinomial)
#reduce sample size to speed up the running time
n = 50;
k = 100;
BASIS = BASISbinomial[1:n,1:k];
y = yBinomial[1:n];
output = EBlassoNEG.Binomial(BASIS,y,0.1,0.1,Epis = FALSE)
Cross Validation (CV) Function to Determine Hyperparameters of the EBlasso Algorithm for Binomial Model with Normal-Exponential-Gamma (NEG) Prior Distribution
Description
Hyperparameters control degree of shrinkage, and are obtained via Cross Validation. This program performs three steps of CV.
1st: a = b = 0.001, 0.01, 0.1, 1;
2nd: fix b= b1; a=[-0.5, -0.4, -0.3, -0.2, -0.1, -0.01, 0.01, 0.05, 0.1, 0.5, 1];
3rd: fix a = a2; b= 0.01 to 10 with a step size of one for b > 1 and a step size of one on the logarithmic scale for b < 1
In the 2nd step, a can take value from -1 and values in [-1, -0.5] can be added to the set in line 13 of this function (The smaller a is, the less shrinkage.)
Usage
EBlassoNEG.BinomialCV(BASIS, Target, nFolds,foldId, Epis,verbose, group)
Arguments
BASIS |
sample matrix; rows correspond to samples, columns correspond to features |
Target |
Class label of each individual, TAKES VALUES OF 0 OR 1 |
nFolds |
number of n-fold cv |
foldId |
random assign samples to different folds |
Epis |
TRUE or FALSE for including two-way interactions |
verbose |
from 0 to 5; larger verbose displays more messages |
group |
TRUE or FALSE; FALSE: No group effect; TRUE two-way interaction grouped. Only valid when Epis = TRUE |
Details
If Epis=TRUE, the program adds two-way interaction K*(K-1)/2 more columns to BASIS
Note: Given the fact that degree of shrinkage is a monotonic function of (a,b),
The function implemented a 3-step search as described in Huang, A. 2014, for full
grid search, user needs to modify the function accordingly.
Value
CrossValidation |
col1: hyperparameters; col2: loglikelihood mean; standard ERROR of nfold mean log likelihood |
a_optimal |
the optimal hyperparameter as computed |
b_optimal |
the optimal hyperparameter as computed |
Author(s)
Anhui Huang; Dept of Electrical and Computer Engineering, Univ of Miami, Coral Gables, FL
References
Huang A, Xu S, Cai X: Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping. BMC genetics 2013, 14(1):5.
Huang, A., S. Xu, et al. Whole-genome quantitative trait locus mapping reveals major role of epistasis on yield of rice. PLoS ONE 2014, 9(1): e87330.
Examples
library(EBEN)
data(BASISbinomial)
data(yBinomial)
#reduce sample size to speed up the running time
n = 50;
k = 100;
BASIS = BASISbinomial[1:n,1:k];
y = yBinomial[1:n];
## Not run:
CV = EBlassoNEG.BinomialCV(BASIS, y, nFolds = 3,Epis = FALSE, verbose = 0)
## End(Not run)
The EBlasso Algorithm for Gaussian Model with Normal-Exponential-Gamma (NEG) Prior Distribution
Description
General linear regression, normal-exponential-gamma (NEG) hierarchical prior for regression coefficients
Usage
EBlassoNEG.Gaussian(BASIS, Target, a_gamma, b_gamma, Epis, verbose, group)
Arguments
BASIS |
sample matrix; rows correspond to samples, columns correspond to features |
Target |
Response each individual |
a_gamma |
Hyperparameters control degree of shrinkage; can be obtained via Cross Validation |
b_gamma |
Hyperparameters control degree of shrinkage; can be obtained via Cross Validation |
Epis |
TRUE or FALSE for including two-way interactions |
verbose |
from 0 to 5; larger verbose displays more messages |
group |
0 or 1; 0: No group effect; 1 two-way interaction grouped. Only valid when Epis = TRUE |
Details
If Epis=TURE, the program adds two-way interaction K*(K-1)/2 more columns to BASIS
for memory efficient, the function pass n_effect to C. n_effect > n_true effects, which is
a rough guess on how many variables will be selected by the function
by providing a relative 'small' n_effect, the function will not allocate
a large trunck of memory during computation.
Value
weight |
the none-zero regression coefficients: |
WaldScore |
Wald Score |
Intercept |
Intercept |
residVar |
residual variance |
a_gamma |
the hyperparameter; same as input |
b_gamma |
the hyperparameter; same as input |
Author(s)
Anhui Huang; Dept of Electrical and Computer Engineering, Univ of Miami, Coral Gables, FL
References
Cai, X., Huang, A., and Xu, S. (2011). Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping. BMC Bioinformatics 12, 211.
Examples
library(EBEN)
data(BASIS)
data(y)
n = 50;
k = 100;
BASIS = BASIS[1:n,1:k];
y = y[1:n];
output = EBlassoNEG.Gaussian(BASIS, y, a_gamma = 0.1, b_gamma = 0.1)
Cross Validation (CV) Function to Determine Hyperparameters of the EBlasso Algorithm for Gaussian Model with Normal-Exponential-Gamma (NEG) Prior Distribution
Description
Hyperparameters control degree of shrinkage, and are obtained
via Cross Validation. This program performs three steps of CV.
1st: a = b = 0.001, 0.01, 0.1, 1;
2nd: fix b= b1; a=[-0.5, -0.4, -0.3, -0.2, -0.1, -0.01, 0.01, 0.05, 0.1, 0.5, 1];
3rd: fix a = a2; b= 0.01 to 10 with a step size of one for b > 1 and a step size of one on the logarithmic scale for b < 1
In the 2nd step, a can take value from -1 and values in [-1, -0.5] can be added to the set in line 13 of this function (The smaller a is, the less shrinkage.)
Usage
EBlassoNEG.GaussianCV(BASIS, Target, nFolds, foldId, Epis,verbose, group)
Arguments
BASIS |
sample matrix; rows correspond to samples, columns correspond to features |
Target |
Class label of each individual, TAKES VALUES OF 0 OR 1 |
nFolds |
number of n-fold cv |
foldId |
random assign samples to different folds |
Epis |
TRUE or FALSE for including two-way interactions |
verbose |
from 0 to 5; larger verbose displays more messages |
group |
TRUE or FALSE; FALSE: No group effect; TRUE two-way interaction grouped. Only valid when Epis = TRUE |
Details
If Epis= TRUE, the program adds two-way interaction K*(K-1)/2 more columns to BASIS
Note: Given the fact that degree of shrinkage is a monotonic function of (a,b),
The function implemented a 3-step search as described in Huang, A. 2014, for full
grid search, user needs to modify the function accordingly.
Value
CrossValidation |
col1: hyperparameters; col2: loglikelihood mean; standard ERROR of nfold mean log likelihood |
a_optimal |
the optimal hyperparameter as computed |
b_optimal |
the optimal hyperparameter as computed |
Author(s)
Anhui Huang; Dept of Electrical and Computer Engineering, Univ of Miami, Coral Gables, FL
References
Huang A, Xu S, Cai X: Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping. BMC genetics 2013, 14(1):5.
Huang, A., S. Xu, et al. Whole-genome quantitative trait locus mapping reveals major role of epistasis on yield of rice. PLoS ONE 2014, 9(1): e87330.
Examples
library(EBEN)
data(BASIS)
data(y)
#reduce sample size to speed up the running time
n = 50;
k = 100;
BASIS = BASIS[1:n,1:k];
y = y[1:n];
## Not run:
CV = EBlassoNEG.GaussianCV(BASIS, y, nFolds = 3,Epis = FALSE)
## End(Not run)
Sample Response Data for Gaussian Model
Description
Corresponding to the response of BASIS
Usage
data(y)
Format
The format is: num [1:1000, 1] 113.5 97.1 116.6 96.7 105.5 ...
Source
Huang, A., Xu, S., and Cai, X. (2014). Empirical Bayesian elastic net for multiple quantitative trait locus mapping. Heredity 10.1038/hdy.2014.79
Examples
data(y)
Sample Variable Data for Binomial Model
Description
Corresponding to the class label of BASISbinomial
Usage
data(yBinomial)
Format
The format is: int [1:500, 1] 1 1 1 1 1 1 1 1 1 1 ...
Source
Huang A, Xu S, Cai X: Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping. BMC genetics 2013, 14(1):5.
Examples
data(BASISbinomial)