Title: | Hybrid Genetic and Simulated Annealing Algorithm for High Dimensional Linear Models with Interaction Effects |
Version: | 1.2.1 |
Description: | We provide a stage-wise selection method using genetic algorithms, designed to efficiently identify main and two-way interactions within high-dimensional linear regression models. Additionally, it implements simulated annealing algorithm during the mutation process. The relevant paper can be found at: Ye, C.,and Yang,Y. (2019) <doi:10.1109/TIT.2019.2913417>. |
License: | GPL-2 |
Encoding: | UTF-8 |
Imports: | utils, Matrix, energy, pracma, stats, selectiveInference, VariableScreening, SIS |
Language: | en-US |
Author: | Leiyue Li [aut, cre], Chenglong Ye [aut] |
Maintainer: | Leiyue Li <lli289.git@gmail.com> |
RoxygenNote: | 7.3.1 |
NeedsCompilation: | no |
Packaged: | 2024-04-04 14:33:29 UTC; lli28 |
Repository: | CRAN |
Date/Publication: | 2024-04-04 15:03:00 UTC |
ABC Evaluation
Description
Gives ABC score for each fitted model. For a model I, the ABC is defined as
ABC(I)=\sum\limits_{i=1}^n\bigg(Y_i-\hat{Y}_i^{I}\bigg)^2+2r_I\sigma^2+\lambda\sigma^2C_I.
When comparing ABC of fitted models to the same dataset, the smaller the ABC, the better fit.
Usage
ABC(
X,
y,
heredity = "Strong",
sigma,
varind = NULL,
interaction.ind = NULL,
lambda = 10
)
Arguments
X |
Input data. An optional data frame, or numeric matrix of dimension
|
y |
Response variable. A |
heredity |
Whether to enforce Strong, Weak, or No heredity. Default is "Strong". |
sigma |
The standard deviation of the noise term. In practice, sigma is usually
unknown. Users can estimate sigma from function |
varind |
A numeric vector that specifies the indices of variables to be extracted from |
interaction.ind |
A two-column numeric matrix. Each row represents a unique
interaction pair, with the columns indicating the index numbers of the variables
involved in each interaction. Note that interaction.ind must be generated
outside of this function using |
lambda |
A numeric value defined by users. The number needs to satisfy the condition:
|
Value
A numeric value is returned. It represents the ABC score of the fitted model.
References
Ye, C. and Yang, Y., 2019. High-dimensional adaptive minimax sparse estimation with interactions.
Examples
# When sigma is known
set.seed(0)
interaction.ind <- t(combn(4,2))
X <- matrix(rnorm(50*4,1,0.1), 50, 4)
epl <- rnorm(50,0,0.01)
y <- 1+X[,1]+X[,2]+X[,1]*X[,2] + epl
ABC(X, y, sigma = 0.01, varind = c(1,2,5), interaction.ind = interaction.ind)
# When sigma is not known
full <- Extract(X, varind = c(1:(dim(X)[2]+dim(interaction.ind)[1])), interaction.ind)
sigma <- selectiveInference::estimateSigma(full, y)$sigmahat # Estimate sigma
Performing crossover
Description
This function gives offspring from parents. It performs crossover at a fixed probability of 0.6.
Usage
Crossover(X, myParent, EVAoutput, heredity = "Strong", r1, r2, numElite = 40)
Arguments
X |
Input data. An optional data frame, or numeric matrix of dimension
|
myParent |
A numeric matrix with dimension |
EVAoutput |
The output from function |
heredity |
Whether to enforce Strong, Weak, or No heredity. Default is "Strong". |
r1 |
At most how many main effects do you want to include in your model?.
For high-dimensional data, |
r2 |
At most how many interaction effects do you want to include in your model? |
numElite |
Number of elite parents. Default is 40. |
Value
Offspring. If crossover occurred, it returns a numeric matrix with dimensions
choose(numElite,2)
by r1+r2
. Otherwise, numElite
by r1 + r2
.
See Also
Examples
set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01,
interaction.ind = interaction.ind)
myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2)
Offsprings <- Crossover(X, myParent, EVAoutput, r1 = 5, r2 = 2)
Evaluating main and interaction effects
Description
This function ranks each main and interaction effect. It also calculate the ABC
score for each potential interactions across different heredity structures.
If heredity = "No"
and the the number of potential interactions exceed
choose(1000,2)
, distance correlation between each variable in X
and y
will be calculated so that it reduces the running time.
This ensures a more efficient evaluation process.
Usage
EVA(
X,
y,
heredity = "Strong",
r1,
sigma,
varind = NULL,
interaction.ind = NULL,
lambda = 10
)
Arguments
X |
Input data. An optional data frame, or numeric matrix of dimension
|
y |
Response variable. A |
heredity |
Whether to enforce Strong, Weak, or No heredity. Default is "Strong". |
r1 |
At most how many main effects do you want to include in your model?.
For high-dimensional data, |
sigma |
The standard deviation of the noise term. In practice, sigma is usually
unknown. Users can estimate sigma from function |
varind |
A numeric vector that specifies the indices of variables to be extracted from |
interaction.ind |
A two-column numeric matrix. Each row represents a unique
interaction pair, with the columns indicating the index numbers of the variables
involved in each interaction. Note that interaction.ind must be generated
outside of this function using |
lambda |
A numeric value defined by users. The number needs to satisfy the condition:
|
Value
A list of output. The components are: ranked main effect, ranked.mainpool
;
and a 4-column matrix contains potential interactions ranked by ABC score, ranked.intermat
.
See Also
Examples
# Strong heredity
set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01, interaction.ind = interaction.ind)
Extracting columns and generating required interaction effects from data
Description
This function simplifies the data preparation process by enabling users to
extract specific columns from their dataset X
, and automatically
generating any necessary interaction effects based on varind
.
Usage
Extract(X, varind, interaction.ind = NULL)
Arguments
X |
Input data. An optional data frame, or numeric matrix of dimension
|
varind |
A numeric vector that specifies the indices of variables to be
extracted from |
interaction.ind |
A two-column numeric matrix. Each row represents a unique
interaction pair, with the columns indicating the index numbers of the variables
involved in each interaction. Note that |
Value
A numeric matrix is returned.
Examples
# Generate interaction.ind
interaction.ind <- t(combn(4,2))
# Generate data
set.seed(0)
X <- matrix(rnorm(20), ncol = 4)
y <- X[, 2] + rnorm(5)
# Extract X1 and X1X2 from X1, ..., X4
Extract(X, varind = c(1,5), interaction.ind)
# Extract X5 from X1, ..., X4
Extract(X, varind = 5, interaction.ind)
# Extract using duplicated values
try(Extract(X, varind = c(1,1), interaction.ind)) # this will not run
Creating initial parents
Description
This function gives initial parents.
Usage
Initial(X, y, EVAoutput, heredity = "Strong", r1, r2, numElite = 40)
Arguments
X |
Input data. An optional data frame, or numeric matrix of dimension
|
y |
Response variable. A |
EVAoutput |
The output from function |
heredity |
Whether to enforce Strong, Weak, or No heredity. Default is "Strong". |
r1 |
At most how many main effects do you want to include in your model?.
For high-dimensional data, |
r2 |
At most how many interaction effects do you want to include in your model? |
numElite |
Number of elite parents. Default is 40. |
Value
Initial parents. A numeric matrix with dimensions numElite
by r1+r2
.
See Also
Examples
set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01,
interaction.ind = interaction.ind)
myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2)
Performing mutation
Description
This function gives mutant from parents.
Usage
Mutation(
myParent,
EVAoutput,
r1,
r2,
initial.temp = 1000,
cooling.rate = 0.95,
X,
y,
heredity = "Strong",
sigma,
varind = NULL,
interaction.ind = NULL,
lambda = 10
)
Arguments
myParent |
A numeric matrix with dimension |
EVAoutput |
The output from function |
r1 |
At most how many main effects do you want to include in your model?.
For high-dimensional data, |
r2 |
At most how many interaction effects do you want to include in your model? |
initial.temp |
Initial temperature. Default is 1000. |
cooling.rate |
A numeric value represents the speed at which the temperature decreases. Default is 0.95. |
X |
Input data. An optional data frame, or numeric matrix of dimension
|
y |
Response variable. A |
heredity |
Whether to enforce Strong, Weak, or No heredity. Default is "Strong". |
sigma |
The standard deviation of the noise term. In practice, sigma is usually
unknown. Users can estimate sigma from function |
varind |
A numeric vector that specifies the indices of variables to be extracted from |
interaction.ind |
A two-column numeric matrix. Each row represents a unique
interaction pair, with the columns indicating the index numbers of the variables
involved in each interaction. Note that interaction.ind must be generated
outside of this function using |
lambda |
A numeric value defined by users. The number needs to satisfy the condition:
|
Value
Mutant. A numeric matrix with dimensions numElite
by r1+r2
.
See Also
Examples
set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01,
interaction.ind = interaction.ind)
myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2)
Mutation(myParent, EVAoutput, r1 = 5, r2 = 2, X = X, y = y,
sigma = 0.1, interaction.ind = interaction.ind)
Hybrid Genetic and Simulated Annealing Algorithm
Description
This is the main function of package hySAINT. It implements both genetic algorithm and simulated annealing. The simulated annealing technique is used within mutation operator.
Usage
hySAINT(
X,
y,
heredity = "Strong",
r1,
r2,
sigma,
interaction.ind = NULL,
varind = NULL,
numElite = 40,
max.iter = 500,
initial.temp = 1000,
cooling.rate = 0.95,
lambda = 10
)
Arguments
X |
Input data. An optional data frame, or numeric matrix of dimension
|
y |
Response variable. A |
heredity |
Whether to enforce Strong, Weak, or No heredity. Default is "Strong". |
r1 |
At most how many main effects do you want to include in your model?.
For high-dimensional data, |
r2 |
At most how many interaction effects do you want to include in your model? |
sigma |
The standard deviation of the noise term. In practice, sigma is usually
unknown. Users can estimate sigma from function |
interaction.ind |
A two-column numeric matrix. Each row represents a unique
interaction pair, with the columns indicating the index numbers of the variables
involved in each interaction. Note that interaction.ind must be generated
outside of this function using |
varind |
A numeric vector that specifies the indices of variables to be extracted from |
numElite |
Number of elite parents. Default is 40. |
max.iter |
Maximum number of iterations. Default is 500. |
initial.temp |
Initial temperature. Default is 1000. |
cooling.rate |
A numeric value represents the speed at which the temperature decreases. Default is 0.95. |
lambda |
A numeric value defined by users. The number needs to satisfy the condition:
|
Value
An object with S3 class "hySAINT"
.
Final.variable.names |
Name of the selected effects. |
Final.variable.idx |
Index of the selected effects. |
Final.model.score |
Final Model ABC. |
All.iter.score |
Best ABC scores from initial parents and all iterations. |
See Also
ABC
, EVA
, Initial
,
Crossover
, Mutation
Examples
set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
hySAINT(X, y, r1 = 5, r2 = 2, sigma = 0.01, interaction.ind = interaction.ind, max.iter = 5)