Help for package itdr

Type:

Package

Title:

Integral Transformation Methods for SDR in Regression

Version:

2.0.1

Depends:

R(≥ 3.5.0)

Imports:

stats,utils,MASS,geigen,magic,energy,tidyr

Description:

The itdr() routine allows for the estimation of sufficient dimension reduction subspaces in univariate regression such as the central mean subspace or central subspace in regression. This is achieved using Fourier transformation methods proposed by Zhu and Zeng (2006) <doi:10.1198/016214506000000140>, convolution transformation methods proposed by Zeng and Zhu (2010) <doi:10.1016/j.jmva.2009.08.004>, and iterative Hessian transformation methods proposed by Cook and Li (2002) <doi:10.1214/aos/1021379861>. Additionally, mitdr() function provides optimal estimators for sufficient dimension reduction subspaces in multivariate regression by optimizing a discrepancy function using a Fourier transform approach proposed by Weng and Yin (2022) <doi:10.5705/ss.202020.0312>, and selects the sufficient variables using Fourier transform sparse inverse regression estimators proposed by Weng (2022) <doi:10.1016/j.csda.2021.107380>.

License:

GPL-2 | GPL-3

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.2.3

Suggests:

knitr, rmarkdown, testthat (≥ 3.0.0)

VignetteBuilder:

knitr

Config/testthat/edition:

NeedsCompilation:

yes

Packaged:

2024-02-25 17:55:58 UTC; talwis

Author:

Tharindu P. De Alwis

[aut, cre], S. Yaser Samadi

[ctb, aut], Jiaying Weng

[ctb, aut]

Maintainer:

Tharindu P. De Alwis <talwis@wpi.edu>

Repository:

CRAN

Date/Publication:

2024-02-26 13:40:02 UTC

Automobiles Data

Description

This data set contains details about automobiles sources from 1985 Ward's Automotive Yearbook.

Usage

data(automobile)

Format

A dataset consists of 205 observations and 26 attributes, including

symboling: -3, -2, -1, 0, 1, 2, 3.
normalized: continuous Ranging from 65 to 256.
make: alfa-romero, audi, bmw, chevrolet, dodge, honda,isuzu, jaguar, mazda, mercedes-benz, mercury,mitsubishi, nissan, peugot, plymouth, porsche,renault, saab, subaru, toyota, volkswagen, volvo
fuel-type: diesel, gas.
aspiration: std, turbo.
num-of-doors: four, two.
body-style: hardtop, wagon, sedan, hatchback, convertible.
drive-wheels: 4wd, fwd, rwd.
engine-location: front, rear.
wheel-base: continuous values, ranging from 86.6 120.9.
length: continuous values, ranging from 141.1 to 208.1.
width: continuous values, ranging from 60.3 to 72.3.
height: continuous values, ranging from 47.8 to 59.8.
curb-weight: continuous values, ranging from 1488 to 4066.
engine-type: dohc, dohcv, l, ohc, ohcf, ohcv, rotor.
num-of-cylinders: eight, five, four, six, three, twelve, two.
engine-size: continuous values, ranging from 61 to 326.
fuel-system: 1bbl, 2bbl, 4bbl, idi, mfi, mpfi, spdi, spfi.
bore: continuous values, ranging from 2.54 to 3.94.
stroke: continuous values, ranging from 2.07 to 4.17.
compression-ratio: continuous values, ranging from 7 to 23.
horsepower: continuous values, ranging from 48 to 288.
peak-rpm: continuous values, ranging from 4150 to 6600.
city-mpg: continuous values, ranging from 13 to 49.
highway-mpg: continuous values, ranging from 16 to 54.
price: continuous values, ranging from 5118 to 45400.

Source

https://archive.ics.uci.edu/ml/datasets/automobile

Bootstrap Estimation for Dimension (d) of Sufficient Dimension Reduction Subspaces.

Description

The function “d.boots()” estimates the dimension of the central mean subspace and the central subspaces in regression.

Usage

d.boots(y,x,wx=0.1,wy=1,wh=1.5,B=500,var_plot=FALSE,space="mean"
                                        ,xdensity="normal",method="FM")

Arguments

y

The n-dimensional response vector.

x

The design matrix of the predictors with dimension n-by-p.

wx

(default 0.1). The tuning parameter for predictor variables.

wy

(default 1). The tuning parameter for the response variable.

wh

(default 1.5). The bandwidth of the kernel density estimation.

B

(default 500). Number of bootstrap samples.

var_plot

(default FALSE). If TRUE, it provides the dimension variability plot.

space

(default “mean”). The defalult is “mean” for the central mean subspace. Other option is “pdf” for estimating the central subspace.

xdensity

(default “normal”). Density function of predictor variables. Options are “normal” for multivariate normal distribution, “elliptic” for elliptical contoured distribution function, or “kernel” for estimating the distribution using kernel smoothing.

method

(default “FM”). The integral transformation method. “FM” for Fourier trans-formation method (Zhu and Zeng 2006), and “CM” for convolution transformation method (Zeng and Zhu 2010).

Value

The outputs includes a table of average bootstrap distances between two subspaceses for each candidate value of d and the estimated value for d.

dis_d

A table of average bootstrap distances for each candidate value of d.

d.hat

The estimated value for d.

plot

Provides the dimension variability plot if plot=TRUE.

Examples


# Use dataset available in itdr package
data(automobile)
head(automobile)
automobile.na <- na.omit(automobile)
# prepare response and predictor variables
auto_y <- log(automobile.na[, 26])
auto_xx <- automobile.na[, c(10, 11, 12, 13, 14, 17, 19, 20, 21, 22, 23, 24, 25)]
auto_x <- scale(auto_xx) # Standardize the predictors
# call to the d.boots() function with required arguments
d_est <- d.boots(auto_y, auto_x, var_plot = TRUE, space = "pdf", xdensity = "normal", method = "FM")
auto_d <- d_est$d.hat

Dimension Selection Testing Methods for the Central Mean Subspace.

Description

The “d.test()” function provides p-values for the hypothesis tests for the dimension of the subpsace. It employs three test statistics: Cook's test, Scaled test, and Adjusted test, using Fourier transform approach for inverse dimension reduction method.

Usage

d.test(y,x,m)

Arguments

y

The n-dimensional response vector.

x

The design matrix of the predictors with dimension n-by-p.

m

An integer specifying the dimension of the central mean reduction subspace to be tested.

Details

The null and alternative hypothesis are

H_0: d=m

H_a: d>m

1. Weighted Chi-Square test statistics (Weng and Yin, 2018):

\hat{\Lambda}=n\sum_{j=m+1}^{p}\hat{\lambda}_j,

where \lambda_j's are the eigenvalues of \widehat{\textbf{V}}, defined under the “invFM()” function.

2. Scaled test statistic (Bentler and Xie, 2000):

\overline{T}_m=[trace(\hat{\Omega}_n)/p^{\star}]^{-1}n\sum_{j=m+1}^{p}\hat{\lambda}_j \sim \mathcal{X}^2_{p^{\star}},

where \hat{\Omega}_n is a covariance matrix, and p^{\star} = (p-m)(2t-m).

3. Adjusted test statistic (Bentler and Xie, 2000):

\tilde{T}_m=[trace(\hat{\Omega}_n)/d^{\star}]^{-1}n\sum_{j=m+1}^{p}\hat{\lambda}_j \sim \mathcal{X}^2_{d^{\star}},

where d^{\star} = [trace(\hat{\Omega}_n)]^{2}/trace(\hat{\Omega}_n^2) .

Value

The d.test() function returns a table of p-values for each test.

References

Bentler P. M., and Xie, J. (2000). Corrections to Test Statistics in Principal Hessian Directions. Statistics and Probability Letters. 47, 381-389.

Weng J., and Yin X. (2018). Fourier Transform Approach for Inverse Dimension Reduction Method. Journal of Nonparametric Statistics. 30, 4, 1029-0311.

Examples

data(pdb)
colnames(pdb) <- NULL
p <- 15
df <- pdb[, c(79, 73, 77, 103, 112, 115, 124, 130, 132, 145, 149, 151, 153, 155, 167, 169)]
dff <- as.matrix(df)
planingdb <- dff[complete.cases(dff), ]
y <- planingdb[, 1]
x <- planingdb[, c(2:(p + 1))]
x <- x + 0.5
xt <- cbind(
  x[, 1]^(.33), x[, 2]^(.33), x[, 3]^(.57), x[, 4]^(.33), x[, 5]^(.4),
  x[, 6]^(.5), x[, 7]^(.33), x[, 8]^(.16), x[, 9]^(.27), x[, 10]^(.5),
  x[, 11]^(.5), x[, 12]^(.33), x[, 13]^(.06), x[, 14]^(.15), x[, 15]^(.1)
)
m <- 1
W <- sapply(1, rnorm)
d.test(y, x, m)

Distance Between Two Subspaces.

Description

The “dsp()” function calculates the distance between two subspaces, which are spanned by the columns of two matrices.

Usage

dsp(A, B)

Arguments

A

A matrix with dimension p-by-d.

B

A matrix with dimension p-by-d.

Details

Let A and B be two full rank matrices of size p \times q. Suppose \mathcal{S}(\textbf{A}) and \mathcal{S}(\textbf{B}) are the column subspaces of matrices A and B, respectively. And, let \lambda_i 's with 1 \geq \lambda_1^2 \geq \lambda_2^2 \geq,\cdots,\lambda_p^2\geq 0, be the eigenvalues of the matrix \textbf{B}^T\textbf{A}\textbf{A}^T\textbf{B}.

1.Trace correlation, (Hotelling, 1936):

\gamma=\sqrt{\frac{1}{p}\sum_{i=1}^{p}\lambda_i^2}

2.Vector correlation, (Hooper, 1959):

\theta=\sqrt{\prod_{i=1}^{p}\lambda_i^2}

Value

Outputs are the following scale values.

r

One mines the trace correlation. That is, r=1-\gamma

q

One mines the vector correlation. That is, q=1-\theta

References

Hooper J. (1959). Simultaneous Equations and Canonical Correlation Theory. Econometrica 27, 245-256.

Hotelling H. (1936). Relations Between Two Sets of Variates. Biometrika 28, 321-377.

Bootstrap Estimation for Hyperparameters.

Description

The “hyperPara()” function estimates the hyperparameters that required in the Fourier transformation method.

Usage

hyperPara(y,x,d,wx=0.1,wy=1,wh=1.5,range=seq(0.1,1,by=.1),
xdensity="normal", B=500,space="mean", method="FM",hyper="wy")

Arguments

y

The n-dimensional response vector.

x

The design matrix of the predictors with dimension n-by-p.

d

An integer specifying the dimension of the sufficient dimension reduction subspace.

wx

(default 0.1). Tuning parameter for the predictor variables.

wy

(default 1). Tuning parameter for the response variable.

wh

(default 1.5). Turning parameter for the bandwidth.

range

(default 0.1,0.2,...,1). A sequence of candidate values for the hyperparameter.

xdensity

Density function of predictor variables.

B

(default 500). Number of bootstrap samples.

space

(default “mean”). Specifies whether to estimate the central mean subspace (“mean”) or the central subspace (“pdf”).

method

(default “FM”). Integral transformation method. “FM” for the Fourier trans-formation method (Zhu and Zeng 2006), and “CM” for the convolution transformation method (Zeng and Zhu 2010).

hyper

(default “wy”). The hyperparameter to be estimated. Other choices are “wx” and “wy”.

Value

The outputs are a table of average bootstrap distances between two subspaces for each candidate value of the hyper parameter.

dis_h

A table of average bootstrap distances for each candidate value of the hyperparameter.

h.hat

The estimated hyperparameter.

References

Zeng P. and Zhu Y. (2010). An Integral Transform Method for Estimating the Central Mean and Central Subspaces. Journal of Multivariate Analysis. 101, 1, 271–290.

Zhu Y. and Zeng P. (2006). Fourier Methods for Estimating the Central Subspace and Central Mean Subspace in Regression. Journal of the American Statistical Association. 101, 476, 1638–1651.

Integral Transformation Methods of Estimating SDR Subspaces in Regression.

Description

The “itdr()” function computes a basis for sufficient dimension reduction subspaces in regression.

Usage

itdr(y,x,d,m=50,wx=0.1,wy=1,wh=1.5,space="mean",
xdensity="normal",method="FM",x.scale=TRUE)

Arguments

y

The n-dimensional response vector.

x

The design matrix of the predictors with dimension n-by-p.

d

An integer specifying the dimension of the sufficient dimension reduction subspace.

m

An integer specifying the number of omega values to use in invFM method.

wx

(default 0.1). Tuning parameter for predictor variables.

wy

(default 1). Tuning parameter for response variable.

wh

(default 1.5). Bandwidth of the kernel density estimation function.

space

(default “mean”). Specifies whether to estimate the central mean subspace (“mean”) or the central subspace (“pdf”).

xdensity

(default “normal”). Density function of predictor variables. Options are “normal” for multivariate normal distribution, “elliptic” for elliptical contoured distribution, or “kernel” for unkown distribution estimated using kernel smoothing method.

method

(default “FM”). Integral transformation method. “FM” for the Fourier transformation method (Zhu and Zeng 2006), “CM” for convolution transformation method (Zeng and Zhu 2010), “iht” for the iterative Hessian transformation method (Cook and Li 2002), and “invFM” for the Fourier transformation approach for inverse dimension reduction method (Weng and Yin, 2018).

x.scale

(default TURE). If TRUE, scale the predictor variables.

Details

Let m(x)=E[y|X=x]. The “itdr()” function computes the integral transformation of the gradient of the mean function m(x), which is defined as

\boldsymbol\psi(\boldsymbol\omega) =\int \frac{\partial}{\partial \textbf{x}}m(\textbf{x}) W(\textbf{x},\boldsymbol\omega)f(\textbf{x})d\textbf{x},

where W(\textbf{x},\boldsymbol\omega) is a non degenerate kernel function and an absolutely integrable function. For Fourier transformation (FM) method and for convolution transformation (CM) method W(\textbf{x},\boldsymbol\omega)=\exp(i\boldsymbol\omega^T\textbf{x}) and W(\textbf{x},\boldsymbol\omega)=H(\textbf{x}-\boldsymbol\omega)=(2\pi\sigma_w^2)^{-p/2}\exp(-(\textbf{x}-\boldsymbol{\omega})^T(\textbf{x}-\boldsymbol\omega)/(2\sigma_w^2)) where is \sigma_w^2 is the turning parameter for predictor variables. The candidate matrix to estimate the central mean subspace (CMS) is

\textbf{M}_{CMS}=\int \boldsymbol\psi(\boldsymbol\omega) \boldsymbol\psi(\boldsymbol\omega)^T K(\boldsymbol\omega)d\boldsymbol\omega,

where K(\boldsymbol{\omega})=(2\pi \sigma_w^2)^{-p/2}\exp{(-||\boldsymbol{\omega}||}/2\sigma_w^2) under “FM”, and K(\boldsymbol{\omega})=1 under “CM”. Here, \sigma_w^2 is a tuning parameter and it refers as "tuning parameter for the predictor variables" and denoted by “wx” in all functions.

Let \{T_v(y)=H(y,v),~ for~~ y,v\in \mathcal{R}\} be the family of transformations for the response variable. That is, v \in \mathcal{R}, the mean response of T_v(y) is m(\boldsymbol{\omega},v)=E[H(y,v)\vert \textbf{X}=\textbf{x}]. Then, integral transformation for the gradient of m(\boldsymbol{\omega},v) is defined as

\boldsymbol{\psi}(\boldsymbol{\omega},v)=\int \frac{\partial}{\partial \textbf{x}}m(\textbf{x},v) W(\textbf{x},\boldsymbol{\omega})f(\textbf{x})d\textbf{x},

where W(\textbf{x},\boldsymbol{\omega}) is define as above. Then, for estimating the central subspace (CS) the candidate matrix is defined as

\textbf{M}_{CS}=\int H(y_1,v)H(y_2,v)dv \int \boldsymbol{\psi}(\boldsymbol{\omega}) \bar{\boldsymbol{\psi}}(\boldsymbol{\omega})^T K(\boldsymbol{\omega})d\boldsymbol{\omega},

where K(\boldsymbol{\omega}) is the same as above, and H(y,v)=(2\pi \sigma_t^2)^{-1/2}\exp(v^2/(2\sigma_t^2)) under “FM”, and H(y,v)=(2\pi \sigma_t^2)^{-1/2}\exp((y-v)^2/(2\sigma_t^2)) under “CM”. Here \sigma_t^2 is a tuning parameter and it refers as the "tuning parameter for the response variable" and is denote by “wy” in all functions.

Remark: There is only one tuning parameter in the candidate matrix for estimating of the CMS, and there are two tuning parameters in the candidate matrix for estimating of the CS.

“invFM” method:

Let (\textbf{y}_i,\textbf{x}_i), i =1,\cdots,n, be a random sample, and assume that the dimension of S_{E(\textbf{Z} | \textbf{Y})} is known to be d. Then, for a random finite sequence of \boldsymbol{\omega}_j \in {R}^p, j=1,\cdots,t, compute \widehat{\boldsymbol{\psi}}(\boldsymbol{\omega}_j) as follows (Weng and Yin, 2018):

\widehat{\boldsymbol{\psi}}(\boldsymbol{\omega}_j)=n^{-1}\sum_{k=1}^n \exp( i \boldsymbol{\omega}_j^T\textbf{y}_k)\widehat{\textbf{Z}}_k, j=1,\cdots,t,

where \widehat{\textbf{Z}}_j=\boldsymbol{\Sigma}_{x}^{-1/2}(\textbf{x}_i-\overline{\textbf{x}}). Now, let \textbf{a}(\boldsymbol{\omega}_j)=Real(\widehat{\boldsymbol{\psi}}(\boldsymbol{\omega}_j)), and \textbf{b}(\boldsymbol{\omega}_j)=Image(\widehat{\boldsymbol{\psi}}(\boldsymbol{\omega}_j)). Then, \widehat{\boldsymbol{\Psi}}= (\textbf{a}(\boldsymbol{\omega}_1),\textbf{b}(\boldsymbol{\omega}_1),\cdots,\textbf{a}(\boldsymbol{\omega}_t),\textbf{b}(\boldsymbol{\omega}_t)), for some t > 0, and the population kernel matrix is \widehat{\textbf{V}} = \widehat{\boldsymbol{\Psi}}\widehat{\boldsymbol{\Psi}}^T. Finally, use the d-leading eigenvectors of \widehat{\textbf{V}} as an estimate for the central subspace.

Remark: We use w instead of \boldsymbol{\omega}_1,\cdots,\boldsymbol{\omega}_t in the itdr() function.

Value

The outputs are a p-by-d matrix and a p-by-p matrix defined as follows.

eta_hat

The estimated p by d matrix, whose coloumns form a basis of the CMS/CS.

M

The estimated p by p candidate matrix.

eigenvalues

Eigenvalues of \widehat{\bold{V}} from the “invFM” method.

psi

Estimation for \widehat{\bold{\Psi}} from the “invFM” method.

References

Cook R. D. and Li, B., (2002). Dimension Reduction for Conditional Mean in Regression. The Annals of Statistics. 30, 455-474.

Weng J. and Yin X. (2018). Fourier Transform Approach for Inverse Dimension Reduction Method. Journal of Nonparametric Statistics. 30, 4, 1029-0311.

Zeng P. and Zhu Y. (2010). An Integral Transform Method for Estimating the Central Mean and Central Subspaces. Journal of Multivariate Analysis. 101, 1, 271–290.

Zhu Y. and Zeng P. (2006). Fourier Methods for Estimating the Central Subspace and Central Mean Subspace in Regression. Journal of the American Statistical Association. 101, 476, 1638–1651.

Examples

data(automobile)
head(automobile)
automobile.na <- na.omit(automobile)
wx <- .14
wy <- .9
wh <- 1.5
d <- 2
p <- 13
df <- cbind(automobile[, c(26, 10, 11, 12, 13, 14, 17, 19, 20, 21, 22, 23, 24, 25)])
dff <- as.matrix(df)
automobi <- dff[complete.cases(dff), ]
y <- automobi[, 1]
x <- automobi[, c(2:14)]
xt <- scale(x)
fit.F_CMS <- itdr(y, xt, d, wx, wy, wh, space = "pdf", xdensity = "normal", method = "FM")
round(fit.F_CMS$eta_hat, 2)

Integral Transformation Methods for SDR Subspaces in Multivariate Regression

Description

The “mitdr()” function implements transformation method for multivariate regression

Usage

mitdr(X,Y,d,m,method="FT-IRE",
                lambda=NA,noB = 5,noC = 20,noW = 2,sparse.cov = FALSE, x.scale = FALSE)

Arguments

X

Design matrix with dimension n-by-p

Y

Response matrix with dimension n-by-q

d

Structure dimension (default 2).

m

The number of omegas, i.e., 2m number of integral transforms

method

(default “FT-IRE”) Specify the method of dimension reduction. Other possible choices are “FT-DIRE”,“FT-SIRE”,“FT-RIRE”, “FT-DRIRE”, and “admmft”.

lambda

Tuning Parameter for “admmft” method. If it is not provided, the optimal lambda value is chosen by cross-validation of the Fourier transformation method.

noB

(default 5) Iterations for updating B. Only required for the “admmft” method.

noC

(default 20) Iterations for updating C. Only required for the “admmft” method.

noW

(default 2) Iterations for updating weight. Only required for the “admmft” method.

sparse.cov

(default FALSE) If TRUE, calculates the soft-threshold matrix. Only required for the “admmft” method.

x.scale

(default FALSE) If TRUE, standardizes each variable for the soft-threshold matrix. Only required for the “admmft” method.

Details

The “mitdr()” function selects the sufficient variables using Fourier transformation sparse inverse regression estimators.

Value

The function output is a p-by-d matrix and the estimated covariance matrix.

Beta_hat

An estimator for the SDR subspace.

sigma_X

Estimated covariance matrix only from the “admmft” method and a null matrix for other methods.

References

Weng, J. (2022), Fourier Transform Sparse Inverse Regression Estimators for Sufficient Variable Selection, Computational Statistics & Data Analysis, 168, 107380.

Weng, J., & Yin, X. (2022). A Minimum Discrepancy Approach with Fourier Transform in Sufficient Dimension Reduction. Statistica Sinica, 32.

Examples

## Not run: 
data(prostate)
Y <- as.matrix(prostate[, 9])
X <- as.matrix(prostate[, -9])
fit.ftire <- mitdr(X, Y, d = 1, method = "FT-DRIRE")
fit.ftire$Beta_hat

## End(Not run)

Planning Database (Published in year 2015)

Description

The Planning Database (pdb) contains selected data from the 2010 Census and the 2009-2013, 5-years American Community Survey (ACS) estimates.

Usage

data(pdb)

Format

A dataset with 815 observations and 344 attributes.

Source

https://www.census.gov/data/datasets/2015/adrm/research/2015-planning-database.html

Prostate Levels

Description

The dataset comprises the level of prostate-specific antigen associated with eight clinical measures in 97 male patients undergoing a radical prostatectomy.

Usage

data(prostate)

Format

An object of class data.frame with 97 rows and 9 columns.

References

Stamey, T. A., Kabalin, J. N., McNeal, J. E., Johnstone, I. M., Freiha, F., Redwine, E. A. et al. (1989). Prostate specific Antigen in the Diagnosis and Treatment of Adenocarcinoma of the Prostate. II. Radical prostatectomy treated patients. The Journal of Urology. 141, 1076, 1083.

Raman Spectroscopy

Description

The Raman dataset contains 69 samples of providing fatty acid information in terms of the percentage of total sample weight and the percentage of total fat content

Usage

data(raman)

Format

An object of class data.frame with 69 rows and 1100 columns.

References

Naes T., Tomic O., Afseth N.K., Segtnan V., Mage I. (2013) Multi-Block Regression Based on Combinations of Orthogonalisation, Pls-Regression and Canonical Correlation Analysis. Chemometrics and Intelligent Laboratory Systems. 124, 32-42.

Recumbent Cows

Description

The recumbent dataset contains information on pregnant dairy cows that become recumbent, i.e., they lie down shortly before or after calving for unknown reasons. This condition can be severe and frequently leads to death of the cow. Clark, Henderson, Hoggard, Ellison and Young (1987) analyzed data collected at the Ruakura (N.Z.) Animal Health Laboratory on a sample of recumbent cows.

Usage

data(recumbent)

Format

A dataset with 9 columns and 435 rows.

Source

Clark, R. G., Henderson, H. V., Hoggard, G. K. Ellison, R. S. and Young, B. J. (1987). The Abiltiy of Biochemical and Haematolgical Tests to Predict Recovery in Periparturient Recumbent Cows. NZ Veterinary Journal. 35, 126-133.