Type: Package
Title: GCC Estimation of the Multilevel Factor Model
Version: 1.1.0
Maintainer: Rui Lin <ruilin1081@gmail.com>
Description: Provides methods for model selection, estimation, inference, and simulation for the multilevel factor model, based on the principal component estimation and generalised canonical correlation approach. Details can be found in "Generalised Canonical Correlation Estimation of the Multilevel Factor Model." Lin and Shin (2025) <doi:10.2139/ssrn.4295429>.
Imports: stats, stringr, sandwich
Suggests: plm
License: GPL (≥ 3)
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.2
Depends: R (≥ 2.10)
NeedsCompilation: no
Packaged: 2025-06-27 08:28:46 UTC; Administrator
Author: Rui Lin [aut, cre], Yongcheol Shin [aut]
Repository: CRAN
Date/Publication: 2025-06-27 09:00:02 UTC

Generalised canonical correlation estimation for the global factors

Description

This function is one of the main functions the package, employing the generalized canonical correlation estimation for both the global factors \boldsymbol{G} and, when not explicitly provided, for the number of global factors r_{0}. Typically, this function is intended for internal purposes. Users can opt for [GCC()] instead of [multilevel()], if they only need to estimate the number of global factors.

Usage

GCC(
  data,
  standarise = TRUE,
  r_max = 10,
  r0 = NULL,
  ri = NULL,
  depvar_header = NULL,
  i_header = NULL,
  j_header = NULL,
  t_header = NULL
)

Arguments

data

Either a data.frame or a list of data matrices of length R. See Details.

standarise

A logical indicating whether the data is standardised before estimation or not. See Details.

r_max

An integer indicating the maximum number of factors allowed. See Details.

r0

An integer of the number of global factors. See Details.

ri

An array of length R containing the number of local factors in each block. See Details.

depvar_header

A character string specifying the header of the dependent variable. See Details.

i_header

A character string specifying the header of the block identifier. See Details.

j_header

A character string specifying the header of the individual identifier. See Details.

t_header

A character string specifying the header of the time identifier. See Details.

Details

The user-supplied data.frame should contain at least four columns, namely the dependent variable (y_{ijt}), block identifier (i), individual identifier (j), and time (t). The user needs to supply their corresponding headers in the data.frame to the function using the parameters "depvar_header", "i_header", "j_header", and "t_header", respectively. If the data is supplied as a list, these arguments will not be used.

If either r0 = NULL or ri = NULL, both of them will be estimated. In such case, "r_max" must be supplied. If "r0" and "ri" are supplied then "r_max" is not needed and will be ignored.

If standarise = TRUE, each time series will be standardised so it has zero mean and unit variance.

Value

A list containing the estimated number of global factors \hat{r}_{0}, the global factors \widehat{\boldsymbol{G}}, and the other elements that are used in multilevel().

References

Lin, R. and Shin, Y., 2025. Generalised Canonical Correlation Estimation of the Multilevel Factor Model. Available at SSRN 4783804.

Examples


panel <- UKhouse # load the data
Y_list <- panel2list(panel, depvar_header = "dlPrice", i_header = "Region",
                                       j_header = "LPA_Type", t_header = "Date")
est_GCC <- GCC(Y_list, r_max = 10)
r0_hat <- est_GCC$r0 # number of global factors
G_hat <- est_GCC$G # global factors

Principal component (PC) estimation of the approximate factor model

Description

Perform PC estimation of the (2D) approximate factor model:

y_{it}=\boldsymbol{\lambda}_{i}^{\prime}\boldsymbol{F}_{t}+e_{it},

or in matrix notation:

\boldsymbol{Y}=\boldsymbol{F}\boldsymbol{\Lambda}^{\prime}+\boldsymbol{e}.

The factors \boldsymbol{F} is estimated as \sqrt{T} times the r eigenvectors of the matrix \boldsymbol{Y}\boldsymbol{Y}^{\prime} corresponding to the r largest eigenvalues in descending order, and the loading matrix is estimated by \boldsymbol{\Lambda}=T^{-1}\boldsymbol{Y}^{\prime}\boldsymbol{F}. See e.g. Bai and Ng (2002).

Usage

PC(Y, r)

Arguments

Y

A T \times N data matrix. T = number of time series observations, N = cross-sectional dimension.

r

= the number of factors.

Value

A list containing the factors and factor loadings:

References

Bai, J. and Ng, S., 2002. Determining the number of factors in approximate factor models. Econometrica, 70(1), pp.191-221.

Examples


# simulate data

T <- 100
N <- 50
r <- 2
F <- matrix(stats::rnorm(T * r, 0, 1), nrow = T)
Lambda <- matrix(stats::rnorm(N * r, 0, 1), nrow = N)
err <- matrix(stats::rnorm(T * N, 0, 1), nrow = T)
Y <- F %*% t(Lambda) + err

# estimation

est_PC <- PC(Y, r)

England and Wales House Price Growth Data Categorised by Regions

Description

A data.frame containing the quarterly (mean) house prices of four different types of properties, (detached, semi-detached, terraced and flats/maisonettes) for 331 local planning authorities (LPA) over the period 1996Q1 to 2021Q2. See also Lin and Shin (2023).

Usage

UKhouse

Format

## 'UKhouse'

Details

Each LPA belongs to one of the ten regions: North East (NE), North West (NW), Yorkshire and the Humber (YH), East Midlands (EM), West Midlands(WM), East of England (EE), London (LD), South East (SE), South West (SW) and Wales (WA). The real house price growth of the j-th LPA-type pair in region i by deflating the nominal house price by CPI and log-differencing it as

\pi_{ijt}=100\times \log\left(\frac{PRICE_{ijt}}{CPI_{t}}\right)-100 \times \log\left(\frac{PRICE_{ij,t-1}}{CPI_{t-1}}\right).

By removing the series with missing observations, it ends up with a balanced panel with R = 10, N =\sum_{i=1}^{R} N_{i} = 1300 and T = 102.

Columns in the dataset:

Source

Office for National Statistics (ONS), ONS website, statistical bulletin, House price statistics for small areas in England and Wales: year ending June 2021

References

Lin, R. and Shin, Y., 2022. Generalised Canonical Correlation Estimation of the Multilevel Factor Model. Available at SSRN 4295429.


Check validity of the data and headers

Description

This is an internal function which checks the validity of the data and provide a list of matrices of length R for estimation.

Usage

check_data(
  data,
  depvar_header = NULL,
  i_header = NULL,
  j_header = NULL,
  t_header = NULL
)

Arguments

data

Either a data.frame or a list of data matrices of length R. See Details.

depvar_header

A character string specifying the header of the dependent variable. See Details.

i_header

A character string specifying the header of the block identifier. See Details.

j_header

A character string specifying the header of the individual identifier. See Details.

t_header

A character string specifying the header of the time identifier. See Details.

Details

See Details of [GCC()].

Value

A list of data matrices of length R.

Examples

panel <- UKhouse # load the data
Y_list <- check_data(panel,
  depvar_header = "dlPrice", i_header = "Region",
  j_header = "LPA_Type", t_header = "Date"
)

Selection criteria for the approximate factor model

Description

This function performs model selection for the (2D) approximate factor model and returns the estimated number of factors.

Usage

infocrit(Y, method, r_max = 10)

Arguments

Y

A T \times N data matrix. T = number of time series observations, N = cross-sectional dimension.

method

A character string indicating which criteria to use.

r_max

An integer indicating the maximum number of factors allowed. 10 by default.

Details

"method" can be one of the following: "ICp2" and "BIC3" by Bai and Ng (2002), "ER" by Ahn and Horenstein (2013), "ED" by Onatski (2010).

Value

The estimated number of factors.

References

Bai, J. and Ng, S., 2002. Determining the number of factors in approximate factor models. Econometrica, 70(1), pp.191-221.

Ahn, S.C. and Horenstein, A.R., 2013. Eigenvalue ratio test for the number of factors. Econometrica, 81(3), pp.1203-1227.

Onatski, A., 2010. Determining the number of factors from empirical distribution of eigenvalues. The Review of Economics and Statistics, 92(4), pp.1004-1016.

Examples

# simulate data

T <- 100
N <- 50
r <- 2
F <- matrix(stats::rnorm(T * r, 0, 1), nrow = T)
Lambda <- matrix(stats::rnorm(N * r, 0, 1), nrow = N)
err <- matrix(stats::rnorm(T * N, 0, 1), nrow = T)
Y <- F %*% t(Lambda) + err

# estimation

r_hat <- infocrit(Y, "BIC3", r_max = 10)

Full estimation of the multilevel factor model

Description

This is the main function of this package which performs full estimation of the multilevel factor model.

Usage

multilevel(
  data,
  ic = "BIC3",
  standarise = TRUE,
  r_max = 10,
  r0 = NULL,
  ri = NULL,
  depvar_header = NULL,
  i_header = NULL,
  j_header = NULL,
  t_header = NULL
)

Arguments

data

Either a data.frame or a list of data matrices of length R. See Details.

ic

A character string of selection criteria to use for estimation of the numbers of local factors. See Details.

standarise

A logical indicating whether the data is standardised before estimation or not. See Details.

r_max

An integer indicating the maximum number of factors allowed. See Details.

r0

An integer of the number of global factors. See Details.

ri

An array of length R containing the number of local factors in each block. See Details.

depvar_header

A character string specifying the header of the dependent variable. See Details.

i_header

A character string specifying the header of the block identifier. See Details.

j_header

A character string specifying the header of the individual identifier. See Details.

t_header

A character string specifying the header of the time identifier. See Details.

Details

The user-supplied data.frame should contain at least four columns, namely the dependent variable (y_{ijt}), block identifier (i), individual identifier (j), and time (t). The user needs to supply their corresponding headers in the data.frame to the function using the parameters "depvar_header", "i_header", "j_header", and "t_header", respectively. If the data is supplied as a list, these arguments will not be used.

If either r0 = NULL or ri = NULL, then both of them will be estimated. In such case, "r_max" must be supplied. If "r0" and "ri" are supplied then "r_max" is not needed and will be ignored.

If standarise = TRUE, each time series will be standardised so it has zero mean and unit variance. It is recommended to standardise the data before estimation.

See Lin and Shin (2025) for more details.

Value

The return value is an S3 object of class "multi_result". It contains a list of the following items:

References

Lin, R. and Shin, Y., 2025. Generalised Canonical Correlation Estimation of the Multilevel Factor Model. Available at SSRN 4783804.

Examples


panel <- UKhouse # load the data

# use data.frame
est_multi <- multilevel(panel, ic = "BIC3", standarise = TRUE, r_max = 5,
                           depvar_header = "dlPrice", i_header = "Region",
                           j_header = "LPA_Type", t_header = "Date")
# or one can use a list of data matrices
Y_list <- panel2list(panel, depvar_header = "dlPrice", i_header = "Region",
                                       j_header = "LPA_Type", t_header = "Date")
est_multi <- multilevel(Y_list, ic = "BIC3", standarise = TRUE, r_max = 5)

data.frame to list of data matrices

Description

This function converts the data.frame to a list of data matrices and finds the dimensions of the multilevel panel.

Usage

panel2list(
  panel,
  depvar_header = NULL,
  i_header = NULL,
  j_header = NULL,
  t_header = NULL
)

Arguments

panel

The user-supplied data frame for the multilevel panel data. See Details.

depvar_header

A character string specifying the header of the dependent variable. See Details.

i_header

A character string specifying the header of the block identifier. See Details.

j_header

A character string specifying the header of the individual identifier. See Details.

t_header

A character string specifying the header of the time identifier. See Details.

Details

See the details of GCC().

Value

A list containing the data matrices of the R blocks. Each of them has dimension T\times N_{i}.

Examples


panel <- UKhouse # load the data

# panel$Region identifies different blocks i=1,...,R.
# panel$LPA_Type identifies different individuals j=1,...,N_i.

Y_list<- panel2list(panel, depvar_header = "dlPrice", i_header = "Region",
                                       j_header = "LPA_Type", t_header = "Date")


Print the relative importance ratios

Description

Print the relative importance ratios

Usage

## S3 method for class 'multi_result'
summary(object, ...)

Arguments

object

An S3 object of class 'multi_result' created by multilevel().

...

Additional arguments.

Value

A matrix containing the summary of the model.

Examples


panel <- UKhouse # load the data
est_multi <- multilevel(panel, ic = "BIC3", standarise = TRUE, r_max = 5,
                           depvar_header = "dlPrice", i_header = "Region",
                           j_header = "LPA_Type", t_header = "Date")
summary(est_multi)


Get the variance estimates of the global component

Description

This function generates the variance estimates of the global component for the j-th individual in block i at time t.

Usage

vcov_global_comp(object, i, j, t)

Arguments

object

An S3 object of class 'multi_result' created by multilevel().

i

An integer indicating the i-th block.

j

An integer indicating the j-th individual in the i-th block.

t

An integer indicating the time.

Value

The variance of the global component.

Examples

panel <- UKhouse # load the data
est_multi <- multilevel(panel, ic = "BIC3", standarise = TRUE, r_max = 5,
                           depvar_header = "dlPrice", i_header = "Region",
                           j_header = "LPA_Type", t_header = "Date")
vcov_global_comp_ijt <- vcov_global_comp(est_multi, i = 1, j = 1, t = 1)

Get the covariance estimates for the global factors

Description

This function generates the covariance estimates for the global factors at time t.

Usage

vcov_global_factor(object, t)

Arguments

object

An S3 object of class 'multi_result' created by [multilevel()].

t

An integer specifying the time

Value

An r_{0} \times r_{0} covariance matrix.

Examples


panel <- UKhouse # load the data
est_multi <- multilevel(panel, ic = "BIC3", standarise = TRUE, r_max = 5,
                           depvar_header = "dlPrice", i_header = "Region",
                           j_header = "LPA_Type", t_header = "Date")
vcov <- vcov_global_factor(est_multi, t = est_multi$T / 2)

Get the covariance estimates for the global factor loadings

Description

This function generates the covariance estimates for the global factor loadings for the j-th individual in block i.

Usage

vcov_global_loading(object, i, j)

Arguments

object

An S3 object of class 'multi_result' created by [multilevel()].

i

An integer indicating the i-th block.

j

An integer indicating the j-th individual in the i-th block.

Value

An r_{0} \times r_{0} covariance matrix.

Examples


panel <- UKhouse # load the data
est_multi <- multilevel(panel, ic = "BIC3", standarise = TRUE, r_max = 5,
                           depvar_header = "dlPrice", i_header = "Region",
                           j_header = "LPA_Type", t_header = "Date")
vcov_gamma_11 <- vcov_global_loading(est_multi, i = 1, j = 1)

Get the variance estimates of the local component

Description

This function generates the variance estimates of the local component for the j-th individual in block i at time t.

Usage

vcov_local_comp(object, i, j, t)

Arguments

object

An S3 object of class 'multi_result' created by multilevel().

i

An integer indicating the i-th block.

j

An integer indicating the j-th individual in the i-th block.

t

An integer indicating the time.

Value

The variance of the local component.

Examples

panel <- UKhouse # load the data
est_multi <- multilevel(panel, ic = "BIC3", standarise = TRUE, r_max = 5,
                           depvar_header = "dlPrice", i_header = "Region",
                           j_header = "LPA_Type", t_header = "Date")
vcov_local_comp_ijt <- vcov_local_comp(est_multi, i = 1, j = 1, t = 1)

Get the covariance estimates for the local factors

Description

This function generates the covariance estimates for the local factors in block i at time t.

Usage

vcov_local_factor(object, i, t)

Arguments

object

An S3 object of class 'multi_result' created by multilevel().

i

An integer indicating the i-th block.

t

An integer specifying the time point.

Value

An r_{i} \times r_{i} covariance matrix.

Examples

panel <- UKhouse # load the data
est_multi <- multilevel(panel, ic = "BIC3", standarise = TRUE, r_max = 5,
                           depvar_header = "dlPrice", i_header = "Region",
                           j_header = "LPA_Type", t_header = "Date")
vcov_local_factor_11 <- vcov_local_factor(est_multi, i = 1, t = 1)

Get the covariance estimates for the local factor loadings

Description

This function generates the covariance estimates for the local loadings for the j-th individual in block i.

Usage

vcov_local_loading(object, i, j)

Arguments

object

An S3 object of class 'multi_result' created by multilevel().

i

An integer indicating the i-th block.

j

An integer indicating the j-th individual in the i-th block.

Value

An r_{i} \times r_{i} covariance matrix.

Examples

panel <- UKhouse # load the data
est_multi <- multilevel(panel, ic = "BIC3", standarise = TRUE, r_max = 5,
                           depvar_header = "dlPrice", i_header = "Region",
                           j_header = "LPA_Type", t_header = "Date")
vcov_local_loading_11 <- vcov_local_loading(est_multi, i = 1, j = 1)