Title: | Computation of Random Matrix Models |
Version: | 0.0.1 |
Description: | We generate random variables following general Marchenko-Pastur distribution and Tracy-Widom distribution. We compute limits and distributions of eigenvalues and generalized components of spiked covariance matrices. We give estimation of all population eigenvalues of spiked covariance matrix model. We give tests of population covariance matrix. We also perform matrix denoising for signal-plus-noise model. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.1 |
Repository: | CRAN |
Collate: | 'Limits.R' 'CovEst.R' 'CovTest.R' 'GeneralMP.R' 'GeneralWishartMax.R' 'SignalPlusNoise.R' |
Imports: | MASS, RMTstat, lpSolve, mpoly, nleqslv, pracma, rARPACK, rootSolve, quadprog |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2022-11-11 19:51:47 UTC; Ethan |
Author: | Xiucai Ding [aut, cre, cph], Yichen Hu [aut, cph] |
Maintainer: | Xiucai Ding <xiucaiding89@gmail.com> |
Date/Publication: | 2022-11-14 17:30:05 UTC |
Estimation of the Spectrum of Population Covariance Matrix
Description
Estimation of the eigenvalues of population covariance matrix given samples.
Usage
MPEst(X, n=nrow(X), k=1, num=NULL, penalty=FALSE, n_spike=0)
MomentEst(X, n=nrow(X), k=1, n_spike=0)
Arguments
X |
n by p data matrix. |
n |
sample size. |
k |
repeated times in estimation. If |
num |
numbers of mass points chosen in estimation. |
penalty |
whether to implement L-1 penalty in inverting Marchenko-Pastur law |
n_spike |
number of spikes in population spectral. |
Details
Given E(X)=0
and Cov(X)=\Sigma
with \Sigma
unknown and fourth moment of X exists, we want to estimate spectrum of \Sigma
from sample covariance matrix X'X/n
.
MPEst
estimates spectrum by inverting Marchenko-Pastur law while MomentEst
estimates spectrum
by estimating the moment of population spectral density.
Those two functions give estimates of the eigenvalues by d
and estimates of spectral density by xs
and cdf
.
Value
MPEst
and MomentEst
give estimation of the spectrum of population covariance matrix and corresponding spectral density.
Author(s)
Xiucai Ding, Yichen Hu
References
[1] El Karoui, N. (2008). Spectrum estimation for large dimensional covariance matrices using random matrix theory. The Annals of Statistics, 36(6), 2757-2790.
[2] Kong, W., & Valiant, G. (2017). Spectrum estimation from samples. The Annals of Statistics, 45(5), 2218-2247.
Examples
require(MASS)
n = 500
p = 250
X = mvrnorm(n, rep(0,p), diag(c(rep(2,p/2),rep(1,p/2))))
MPEst(X, n)$d
MomentEst(X, n)$d
High-dimensional Covariance Test
Description
Test of given population covariance matrix, test of equal covariance of two or more samples.
Usage
OneSampleCovTest(X, mean=NULL, S=NULL)
TwoSampleCovTest(X1, X2, mean=NULL)
MultiSampleCovTest(..., input=NULL)
Arguments
X , X1 , X2 |
input samples in the form n by p where p is the dimension. |
mean |
population mean of samples. If it is missing, sample mean will be used. |
S |
covariance matrix to be tested. If it is missing, test of identity covariance will be performed. |
... |
any samples to be tested. |
input |
list of samples to be tested. Please choose either |
Value
OneSampleCovTest
tests given covariance matrix of one sample,
TwoSampleCovTest
tests equal covariance matrices of two samples,
MultiSampleCovTest
tests equal covariance matrices of multiple samples.
Author(s)
Xiucai Ding, Yichen Hu
Source
Maximal likelihood tests fail in high-dimensional settings, so corrections are made. Note all tests are one-sided. Large statistics indicate violation of null hypothesis.
References
[1] Zheng, S., Bai, Z., & Yao, J. (2015). Substitution principle for CLT of linear spectral statistics of high-dimensional sample covariance matrices with applications to hypothesis testing. The Annals of Statistics, 43(2), 546-591.
Examples
require(MASS)
n = 500
p = 100
S1 = diag(rep(1,p))
S2 = diag(sample(c(1,4),p,replace=TRUE))
OneSampleCovTest(mvrnorm(n,rep(0,p),S2), S=S1)
TwoSampleCovTest(mvrnorm(n,rep(0,p),S1), mvrnorm(n,rep(0,p),S2))
MultiSampleCovTest(mvrnorm(n,rep(0,p),S1), mvrnorm(n,rep(0,p),S2))
General Marchenko-Pastur Distribution
Description
Density, distribution function, quantile function and random generation for the general Marchenko-Pastur distribution, the limiting distribution of empirical spectral measure for large Wishart matrices.
Usage
qgmp(p, ndf=NULL, pdim=NULL, svr=ndf/pdim, eigens=NULL, lower.tail=TRUE,
log.p=FALSE, m=500)
rgmp(n, ndf=NULL, pdim=NULL, svr=ndf/pdim, eigens=NULL, m=500)
pgmp(q, ndf=NULL, pdim=NULL, svr=ndf/pdim, eigens=NULL, lower.tail=TRUE,
log.p=FALSE, m=500)
dgmp(x, ndf=NULL, pdim=NULL, svr=ndf/pdim, eigens=NULL, log.p=FALSE, m=500)
Arguments
x , q |
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observation. |
m |
number of points used in estimating density. |
ndf |
the number of degrees of freedom for the Wishart matrix. |
pdim |
the number of dimensions (variables) for the Wishart matrix. |
svr |
samples to variables ratio; the number of degrees of freedom per dimension. |
log , log.p |
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are
|
eigens |
input eigenvalues of population covariance matrix. |
Details
Those functions work only for non-spiked part.
To achieve high accuracy of estimation, eigens
should be large, like
larger than 500.
In general Marchenko Pastur distributions, the support of density is the union of one or more intervals.
Value
dgmp
gives the density,
pgmp
gives the distribution function,
qgmp
gives the quantile function,
rgmp
generates random deviates,
Author(s)
Xiucai Ding, Yichen Hu
Source
If eigens
is missing, functions from package
RMTstat
will be used to compute classical Marchenko-Pastur
distribution.
References
[1] Knowles, A., & Yin, J. (2017). Anisotropic local laws for random matrices. Probability Theory and Related Fields, 169(1), 257-352.
[2] Bai, Z., & Yao, J. (2012). On sample eigenvalues in a generalized spiked population model. Journal of Multivariate Analysis, 106, 167-177.
[3] Ding, X. (2021). Spiked sample covariance matrices with possibly multiple bulk components. Random Matrices: Theory and Applications, 10(01), 2150014.
[4] Ding, X., & Trogdon, T. (2021). A Riemann–Hilbert approach to the perturbation theory for orthogonal polynomials: Applications to numerical linear algebra and random matrix theory. arXiv preprint arXiv:2112.12354.
Examples
N = 1000
M = 300
d = c(rep(3.8,M/3),rep(1.25,M/3),rep(0.25,M/3))
qgmp(0.5, ndf=N, pdim=M, eigens=d)
pgmp(3, ndf=N, pdim=M, eigens=d)
dgmp(2, ndf=N, pdim=M, eigens=d)
rgmp(2, ndf=N, pdim=M, eigens=d)
The Wishart Maximum Eigenvalue Distribution
Description
Density, distribution function, quantile function and random generation for the maximum eigenvalue from a general non-spiked Wishart matrix
(sample covariance matrix) with ndf
degrees of freedom,
pdim
dimensions, and order
parameter beta
.
Usage
dWishartMax(x, eigens, ndf, pdim, beta, log = FALSE)
pWishartMax(q, eigens, ndf, pdim, beta, lower.tail = TRUE, log.p = FALSE)
qWishartMax(p, eigens, ndf, pdim, beta, lower.tail = TRUE, log.p = FALSE)
rWishartMax(n, eigens, ndf, pdim, beta)
Arguments
x , q |
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. |
eigens |
eigenvalues of population covariance matrix. |
ndf |
the number of degrees of freedom for the Wishart matrix |
pdim |
the number of dimensions (variables) for the Wishart matrix |
beta |
the order parameter. 1 for real Wishart and 2 for complex Wishart. |
log , log.p |
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are
|
Details
A real Wishart matrix is equal in distribution to
X^T X/n
, where X
are
n\times p
real matrix with elements of mean zero and
covariance matrix \Sigma
.
A complex Wishart matrix is equal in distribution to
X^* X/n
, where both real and imagety part of X
are
n\times p
complex matrice with elements of mean zero and
covariance matrix \Sigma/2
. eigens
are eigenvalues of
\Sigma
. These functions give the limiting distribution of the largest
eigenvalue from the such a matrix when ndf
and pdim
both tend to
infinity.
Value
dWishartMax
gives the density,
pWishartMax
gives the distribution function,
qWishartMax
gives the quantile function,
rWishartMax
generates random deviates.
Author(s)
Xiucai Ding, Yichen Hu
References
[1] El Karoui, N. (2007). Tracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. The Annals of Probability, 35(2), 663-714.
[2] Lee, J. O., & Schnelli, K. (2016). Tracy–Widom distribution for the largest eigenvalue of real sample covariance matrices with general population. The Annals of Applied Probability, 26(6), 3786-3839.
Examples
n = 500
p = 100
eigens = c(rep(2,p/2), rep(1, p/2))
beta = 2
rWishartMax(5, eigens, n, p, beta=beta)
qWishartMax(0.5, eigens, n, p, beta=beta)
pWishartMax(3.5, eigens, n, p, beta=beta)
dWishartMax(3.5, eigens, n, p, beta=beta)
Limits in High-dimensional Sample Covariance
Description
Some limits of eigenvalues and eigenvectors in high-dimensional sample covariance.
Usage
MP_vector_dist(k, v, ndf=NULL, pdim, svr=ndf/pdim, cov=NULL)
cov_spike(spikes, eigens, ndf, svr)
quadratic(k, cov, svr, spikes, type=1)
Arguments
k |
k-th eigenvector. In |
v |
vector to be projected on. |
ndf |
the number of degrees of freedom for the Wishart matrix. |
pdim |
the number of dimensions (variables) for the Wishart matrix. |
svr |
samples to variables ratio; the number of degrees of freedom per dimension. |
cov |
population covariace matrix. If it is null, it will be regarded as identity. |
eigens |
input eigenvalues of population covariance matrix without spikes. |
spikes |
spikes in population covariance matrix. |
type |
transformation of eigenvalues. n for n-th power. 0 for logarithm. |
Details
In MP_vector_dist
, the variance computed is for \sqrt{\code{pdim}}u_k^T v
, where u_k
is the k-th eigenvector.
Note in quadratic
, k should be within the spikes.
Value
MP_vector_dist
gives asymptotic variance of projection of eigenvectors of non-spiked Wishart matrix,
cov_spike
gives spikes in sample covariance matrix and their asymptotic variance.
quadratic
gives mean of certain quadratic forms of k-th sample eigenvector of spiked models. Note k should be within the spikes.
Author(s)
Xiucai Ding, Yichen Hu
References
[1] Knowles, A., & Yin, J. (2017). Anisotropic local laws for random matrices. Probability Theory and Related Fields, 169(1), 257-352.
[2] Jolliffe, I. (2005). Principal component analysis. Encyclopedia of statistics in behavioral science.
Examples
k = 1
n = 200
p = 100
v = runif(p)
v = v/sqrt(sum(v^2))
MP_vector_dist(k,v,n,p,cov=diag(p))
cov_spike(c(10),rep(1,p),n,n/p)
quadratic(k,diag(p),n/p,c(30))
Signal-Plus-Noise Models
Description
Estimation of signals, rank of signals.
Usage
StepWiseSVD(Y, threshold=NULL, B=1000, level=0.02, methods='kmeans',
u_threshold=NULL, v_threshold=NULL, sparse=TRUE)
ScreeNot(Y, r1)
GetRank(Y, r1, type=c("1","2"), level=0.1, B=500)
signal_value(d, svr)
signal_vector(k1, k2, d1, d2, svr, left=TRUE)
Arguments
Y |
matrix to be denoised. |
B |
repeat time of simulations. |
threshold |
threshold used in determining rank of signal. |
level |
significance level in determing ranks. |
methods |
methods used in determining sparse structure. |
u_threshold , v_threshold |
thresholds used in determining sparse structure if kmeans is not used. |
sparse |
whether signals have sparse structure. |
r1 |
upper bound of rank. |
type |
type of test. |
k1 , k2 |
k-th eigenvector. |
d , d1 , d2 |
eigenvalues of corresponding signal matrix |
left |
whether to use left singular vectors. |
svr |
ndf/ndim of Y. |
Details
StepWiseSVD
works well in sparse setting and requires i.i.d normal noise and a lot simulation time.SreeNot
is to pick the best TSVD result so works well in general setting.
When using signal-plus-noise related limits, make sure they are limits of signal-related values or vectors.
Value
StepWiseSVD
performs step-wise SVD to denoise and returns decomposed strcuture,
ScreeNot
performs ScreeNot to denoise and returns decomposed strcuture,
GetRank
gives rank of signals.
signal_value
gives corrected signal eigenvalue from SVD result,
signal_vector
gives limiting inner product between signal vector and corresponding signal-plus-noise vector.
Author(s)
Xiucai Ding, Yichen Hu
References
[1] Ding, X. (2020). High dimensional deformed rectangular matrices with applications in matrix denoising. Bernoulli, 26(1), 387-417.
[2] Donoho, D. L., Gavish, M., & Romanov, E. (2020). Screenot: Exact mse-optimal singular value thresholding in correlated noise. arXiv preprint arXiv:2009.12297.
[3] Ding, X., & Yang, F. (2022). Tracy-Widom distribution for heterogeneous Gram matrices with applications in signal detection. IEEE Transactions on Information Theory, vol. 68, no. 10, pp. 6682-6715.