Type: Package
Title: An Optimal Subset Selection for Distributed Hypothesis Testing
Version: 0.2.0
Maintainer: Guangbao Guo <ggb11111111@163.com>
Description: In the era of big data, data redundancy and distributed characteristics present novel challenges to data analysis. This package introduces a method for estimating optimal subsets of redundant distributed data, based on PPCDT (Conjunction of Power and P-value in Distributed Settings). Leveraging PPC technology, this approach can efficiently extract valuable information from redundant distributed data and determine the optimal subset. Experimental results demonstrate that this method not only enhances data quality and utilization efficiency but also assesses its performance effectively. The philosophy of the package is described in Guo G. (2020) <doi:10.1007/s00180-020-00974-4>.
License: Apache License (== 2.0)
Depends: R (≥ 3.5.0)
Encoding: UTF-8
Imports: MASS,stats
NeedsCompilation: no
RoxygenNote: 7.3.1
Packaged: 2024-07-06 14:45:07 UTC; LJR
Author: Guangbao Guo [aut, cre, cph], Jiarui Li [ctb]
Repository: CRAN
Date/Publication: 2024-07-08 11:00:06 UTC

An Optimal Subset Selection for Distributed Hypothesis Testing

Description

We introduce an optimal subset selection for distributed hypothesis testing called as PPCDT.

Usage

PPCDT(X,Y,alpha,K)

Arguments

X

A independent variable

Y

The response variable

alpha

Significance level

K

The number of blocks into which variable X is divided

Value

Xopt

optimal subset of selected independent variables

Yopt

optimal subset of selected response variables

Bopt

Regression coefficients

Eopt

The Mean Squared Error of optimal subset

Aopt

The Mean Absolute Error of optimal subset

Author(s)

Guangbao Guo, Jiarui Li

Examples

alpha=0.05
t=5;K=10;n=1000;p=5
X=matrix(rnorm(n*p,0,1),ncol=p)
beta=matrix(runif(p),nrow = p)
esp=matrix(rnorm(n),nrow = n)
Y=X%*%beta+esp
PPCDT(X=X,Y=Y,alpha=alpha,K=K)