--- title: "itdr-vignette" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{itdr-vignette} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Overview **itdr** is a system for estimating a basis of the central and central mean subspaces or selecting sufficient dimension reduction variables in regression using integral transformation methods. This `vignette` demonstrate the usage of functions within the **itdr** package across various datasets, including `automobile`, `recumbent` ,`pdb`, `prostate`, and `raman`. ## Chapter 1: Installation ### 1.1: Install **itdr** package The **itdr** R package can be installed in three ways: * From the Comprehensive R Archive Network (CRAN): Utilize the `install.packages()` function in R. Then, import the `itdr` package into the working session using the `library()` function. That is, ```{r eval=FALSE, include=FALSE} install.packages("itdr") library(itdr) ``` * From the binary source package: Use the `intall.package()` function in R. Then, import the **itdr** package into the working session using the `library()` function. That is, ```{r eval=FALSE, include=TRUE} install.packages("~/itdr.zip") library(itdr) ``` * From GitHub: Use the `install_github()` function in R `devtools` package as follows. ```{r eval=FALSE, include=TRUE} library(devtools) install_github("TharinduPDeAlwis/itdr") library(itdr) ``` ## Chpater 2: Functions for Fourier and Convolution Transformation Methods in Sufficient Dimension Reduction (SDR) Subspace Estimation This section provides an overview of the functions within the **itdr** package that utilize the Fourier transformation method to estimate sufficient dimension reduction (SDR) subspaces in regression. Specifically, we focus on the Fourier transformation method. However, by passing the argument `method="CM"` to the `itdr()` function, the convolution transformation method can be employed. Before estimating the SDR subspaces, it is essential to determine the dimension (d) of the SDR subspace and tuning parameters `sw2`, and `st2`. Section 2.1 demonstrates the estimation of dimension (d) is demonstrated. The estimation of the tuning parameter `sw2` for both the central subspace (CS) and the central mean subspace (CMS), is explained in Section 2.2.1. Moreover, Section 2.2.2 outlines the estimation of `st2` for the central subspace (CS). Finally, The practical application of the itdr() function for estimating the central subspace is provided in Section 2.3. ### 2.1: Estimating the dimension (d) of sufficient dimension reduction (SDR) subspaces A bootstrap estimation procedure is employed to estimate the unknown dimension (d) of sufficient dimension reduction subspaces, as detailed in Zhu and Zeng (2006). The `d.boots()` function can be used to implement this estimation. To estimate the dimension `(d)` of the central subspace of the `automobile` dataset, with response variable $y$, and predictor variables $x$ as specified in Zhu and Zeng (2006), the following arguments are passed to the `d.boots()` function: `space="pdf"` to estimate the CS, `xdensity="normal"` for assuming normal density for the predictors, and `method="FM"` for utilizing the Fourier transformation method. ```{r eval=TRUE, include=TRUE} library(itdr) data(automobile) automobile.na <- na.omit(automobile) # prepare response and predictor variables auto_y <- log(automobile.na[, 26]) auto_xx <- automobile.na[, c(10, 11, 12, 13, 14, 17, 19, 20, 21, 22, 23, 24, 25)] auto_x <- scale(auto_xx) # Standardize the predictors # call to the d.boots() function with required #arguments d_est <- d.boots(auto_y, auto_x, var_plot = TRUE, space = "pdf", xdensity = "normal", method = "FM" ) auto_d <- d_est$d.hat auto_d ``` Here, the estimate of the dimension of the central subspace for 'automobile' data is 2, i.e., d_hat=2. ### 2.2: Estimating tuning parameters and bandwidth parameters for Gaussian kernel density estimation In the process of estimating SDR subspaces using the Fourier method, two crucial parameters need to be determined: `sw2` and `st2`. The `sw2`. While `sw2` is required for both the central mean (CMS) and the central subspace (CS), `st2` is only necessary for the central subspace. The code in Section 2.2.1 demonstrates the use of function `wx()` to estimate the tuning parameter `sw2`, while Section 2.2.2 elaborates on using the `wy()` function to estimate the tuning parameter `st2`. ### 2.2.1: Estimate `sw2` To estimate the tuning parameter `sw2`, the `wx()` function is utilized with the subspace option set to either `space="pdf"` for the CS and `space="mean"` for the CMS. Other parameters remain fixed during the estimation process. The following R code chunk demonstrates the estimation of `sw2` for the central subspace. ```{r eval=TRUE, include=TRUE} auto_d <- 2 # The estimated value from Section 2.1 candidate_list<-seq(0.05, 1, by = 0.01) auto_sw2 <- hyperPara(auto_y, auto_x, auto_d, range = candidate_list,xdensity = "normal", B = 500, space = "pdf", method = "FM", hyper = "wx") auto_sw2$wx.hat # we get the estimator for sw2 as 0.09 ``` ### 2.2.2: Estimate `st2` The estimate of the tuning parameter `st2`, the `wy()` function is employed. Other parameters are fixed during the estimation. Notice that, there is no need to specify the `space`, because the tuning parameter `st2` exclusively required for the central subspace (CS). ```{r eval=TRUE, include=TRUE} auto_d <- 2 # Estimated value from Section 2.1 auto_st2 <- hyperPara(auto_y, auto_x, auto_d, wx = 0.1, range = seq(0.1, 1, by = 0.1), xdensity = "normal", method = "FM", hyper = "wy") auto_st2$wy.hat # we get the estimator for st2=0.9 ``` ### 2.2.3: Estimate the bandwidth (`h`) of the Gaussian kernel density function When the distribution function of the predictor variables is unknown, Gaussian kernel density estimation is utilized to approximate the density function of the predictor variables. In such cases, the bandwidth parameter needs to be estimated, especially when `xdensity="kernel"` is specified. The `wh()` function uses the bootstrap estimator to estimate the bandwidth of the Gaussian kernel density estimation. ```{r eval=TRUE, include=TRUE} h_hat <- hyperPara(auto_y, auto_x, auto_d, wx = 5, wy = 0.1, range = seq(0.1, 2, by = .1), space = "pdf", method = "FM", hyper = "wh") # Bandwidth estimator for Gaussian kernel density estimation for central subspace h_hat$h.hat # we have the estimator as h_hat=0.1 ``` Ensure proper parameterization and selection of options for accurate estimation. ### 2.3: Estimate SDR subspaces Having outlined the estimation procedure for tuning parameters in the Fourier method in Sections 2.1-2.2, we are now prepared to estimate the SDR subspaces. Zhu and Zeng (2006) utilized the Fourier method to facilitate the estimation of the SDR subspaces when the predictors follow a multivariate normal distribution. However, when the predictor variables follow an elliptical distribution or, more generally, when the distribution of the predictors is unknown, the predictors' distribution function is approximated using Gaussian kernel density estimation (Zeng and Zhu, 2010). The `itdr()` function is employed to estimate the SDR subspaces under the `FM` method as follows. Since the default setting of the `itdr()` function has `method="FM"`, it is optional to specify the method as "FM". ```{r eval=TRUE, include=TRUE} library(itdr) data(automobile) head(automobile) df <- cbind(automobile[, c(26, 10, 11, 12, 13, 14, 17, 19, 20, 21, 22, 23, 24, 25)]) dff <- as.matrix(df) automobi <- dff[complete.cases(dff), ] d <- 2 # Estimated value from Section 2.1 wx <- .14 # Estimated value from Section 2.2.1 wy <- .9 # Estimated value from Section 2.2.2 wh <- 1.5 # Estimated value from Section 2.2.3 p <- 13 # Estimated value from Section 2.3 y <- automobi[, 1] x <- automobi[, c(2:14)] xt <- scale(x) # Distribution of the predictors is a normal distribution fit.F_CMS <- itdr(y, xt, d, wx, wy, wh, space = "pdf", xdensity = "normal", method = "FM") round(fit.F_CMS$eta_hat, 2) # Distribution of the predictors is a unknown (using kernel method) fit.F_CMS <- itdr(y, xt, d, wx, wy, wh, space = "pdf", xdensity = "kernel", method = "FM") round(fit.F_CMS$eta_hat, 2) ``` ## Chapter 3: Functions related with estimating the central mean subspace using Iterative Hessian Transformation (IHT) The `itdr()` function extends its utility to estimating the central mean subspace in regression through the iterative Hessian transformation (IHT) method. The following R chunk illustrates the application of this method on the `Recumbent` dataset available within the **itdr** package. Notice that the function requires inputs such as the method of estimation (`method=iht`), the response vector (y), the design matrix of predictors (x), and the dimension (d) of the CMS. These choices for the response and predictors align with those of Cook and Li (2002). ```{r eval=TRUE, include=TRUE} library(itdr) data("recumbent") recumbent.na <- na.omit(recumbent) y <- recumbent.na$outcome X1 <- log(recumbent.na$ast) X2 <- log(recumbent.na$ck) X3 <- log(recumbent.na$urea) p <- 3 x <- matrix(c(X1, X2, X3), ncol = p) d <- 2 fit.iht_CMS <- itdr(y, x, 2, method = "iht") fit.iht_CMS$eta_hat ``` ## Chapter 4: Estimating the Central Subspace in Regression using Fourier Transformation Approach on Inverse Dimension Reduction In this section, we demonstrate the functions within the **itdr** package related to the Fourier transformation approach on inverse dimension reduction. Section 4.1 illustrates the use of function to estimate the dimension of the central subspace using the Fourier transformation approach on inverse dimension reduction, while Section 4.2 outlines the estimation process for the CS itself. ### 4.1: Estimating d The estimation of the dimension of the CS can be achieved using the `d.test()` function which provides outputs of three different p-values for three different test statistics: the Weighted Chi-square test statistic (Weng and Yin, 2018), the Scaled test statistic (Bentler and Xie, 2000), and the Adjusted test statistic (Bentler and Xie, 2000). Suppose `m` is the candidate dimension of the CS to be tested `(H_0: d=m)`, then the following R code demonstrates testing a candidate value `m` (