Type: | Package |
Title: | Randomization-Based Inference Under Inexact Matching |
Version: | 2.0.0 |
Description: | Randomization-based inference for average treatment effects in potentially inexactly matched observational studies. It implements the inverse post-matching probability weighting framework proposed by the authors. The post-matching probability calculation follows the approach of Pimentel and Huang (2024) <doi:10.1093/jrsssb/qkae033>. The optimal full matching method is based on Hansen (2004) <doi:10.1198/106186006X137047>. The variance estimator extends the method proposed in Fogarty (2018) <doi:10.1111/rssb.12290> from the perfect randomization settings to the potentially inexact matching case. Comparisons are made with conventional methods, as described in Rosenbaum (2002) <doi:10.1007/978-1-4757-3692-2>, Fogarty (2018) <doi:10.1111/rssb.12290>, and Kang et al. (2016) <doi:10.1214/15-aoas894>. |
Imports: | MASS, xgboost, optmatch |
Suggests: | VGAM, mvtnorm |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-03-10 20:45:40 UTC; jiananzhu |
Author: | Jianan Zhu [aut, cre], Jeffrey Zhang [aut], Zijian Guo [aut], Siyu Heng [aut] |
Maintainer: | Jianan Zhu <jz4698@nyu.edu> |
Repository: | CRAN |
Date/Publication: | 2025-03-12 17:40:05 UTC |
Randomization-based inference using inverse post-matching probability weighting (IPPW)
Description
This function implements the inverse post-matching probability weighting (IPPW) method proposed in Zhu, Zhang, Guo and Heng (2024). It can be used for conducting randomization-based inference for the sample average treatment effect in potentially inexactly matched observational studies. By default, the matching design implemented in the package is optimal full matching, and the estimated propensity scores used in our method are obtained by the XGBoost method.
Usage
IPPW(Y, Z, X, min.controls = 0.0001,max.controls = 10000, caliper = TRUE,
calipersd = 0.2, dim = FALSE, gamma = 0.1, alpha = 0.05)
Arguments
Y |
The observed outcome vector. |
Z |
The treatment indicator vector. |
X |
The covariates matrix. Each row is an individual. |
min.controls |
The minimum ratio of controls to treatments permitted within a matched set. The default is 0.0001. |
max.controls |
The maximum ratio of controls to treatments permitted within a matched set. The default is 10000. |
caliper |
Whether adding a propensity score caliper or not. The default is TRUE. |
calipersd |
The standard deviation of the logit propensity score for caliper. The default is 0.2. |
dim |
Whether using difference-in-means estimator to estimate the average treatment effect. The default is FALSE. |
gamma |
The regularization threshold. The default is 0.1. |
alpha |
The prespecified level alpha for the 1-alpha condifence interval. |
Value
estimate |
The estimate for the sample average treatment effect using the IPPW estimator. |
var |
The variance for the sample average treatment effect using the IPPW estimator. |
low |
The lower bound for the sample average treatment effect using the IPPW estimator. |
up |
The upper bound for the sample average treatment effect using the IPPW estimator. |
CI |
The confidence interval for the sample average treatment effect using the IPPW estimator. |
balance |
The pre- and post-matching covariate balance table. |
Author(s)
Jianan Zhu (maintainer), Jeffrey Zhang, Zijian Guo and Siyu Heng.
References
Fogarty, C. B. (2018). On mitigating the analytical limitations of finely stratified experiments. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(5), 1035-1056.
Hansen, B. B. (2004). Full matching in an observational study of coaching for the SAT. Journal of the American Statistical Association, 99(467), 609-618.
Hansen, B. B. and Klopfer, S. O. (2006). Optimal full matching and related designs via network flows. Journal of Computational and Graphical Statistics, 15(3), 609-627.
Kang, H., Kreuels, B., May, J., and Small, D. S. (2016). Full matching approach to instrumental variables estimation with application to the effect of malaria on stunting. The Annals of Applied Statistics, 10(1), 335-364.
Rosenbaum, P. R. (1991). A characterization of optimal designs for observational studies. Journal of the Royal Statistical Society: Series B (Methodological), 53(3), 597-610.
Zhu, J., Zhang, J., Guo, Z., and Heng, S. (2024). Randomization-Based Inference for Average Treatment Effect in Inexactly Matched Observational Studies. arXiv preprint, arXiv:2308.02005.
Examples
library(MASS)
library(xgboost)
library(optmatch)
# Generate data
set.seed(1)
d = 3
n = 30
sigma = diag(d)
# Generate X
X_d = mvtnorm::rmvnorm(n, mean = rep(0,d), sigma = sigma)
# Generate Z
C = -2.5
fx = 0.1*(X_d[,1])^3 + 0.3*(X_d[,2]) + 0.2*log((X_d[,3])^2) +
abs(X_d[,1]*X_d[,2]) + rnorm(n,0,1) + C
p = exp(fx)/(1+exp(fx)) # the probability of receiving the treatment
Z = rep(0,length(p))
for(i in seq_along(p)){
Z[i] = rbinom(1,1,p[i])
}
# Generate Y
Y_0 = 0.2*(X_d[,1])^3 + 0.2*abs(X_d[,2]) + 0.5*abs(X_d[,3]) + rnorm(n,0,1)
Y_1 = Y_0 + 1 + 0.3*X_d[,1] + 0.2*X_d[,3]^3
Y = (1-Z)*Y_0 + Z*Y_1
# The output
est = IPPW(Y,Z,X_d,min.controls = 0.01,max.controls = 100,caliper=FALSE,
calipersd = 0.2,dim=FALSE,gamma=0.1,alpha=0.05)$estimate
est
The bias-corrected Wald estimator for the complier average treatment effect
Description
This function implements the bias-corrected Wald estimator for randomization-based estimation and inference for the complier average treatment effect under inexact matching, proposed in Zhu, Zhang, Guo and Heng (2024). By default, the matching design implemented in the package is optimal full matching, and the estimated propensity scores used in our method are obtained by the XGBoost method.
Usage
IPPW_IV(Y, Z, X, D, min.controls = 0.0001, max.controls = 10000,
caliper = TRUE, calipersd = 0.2, classical = FALSE, gamma = 0.1,
lower.bound, upper.bound, by,alpha)
Arguments
Y |
The observed outcome vector. |
Z |
The binary instrument vector. |
X |
The covariates matrix. Each row is an individual. |
D |
The binary treatment indicator vector. |
min.controls |
The minimum ratio of the unencouraged to the encouraged permitted within a matched set. The default is 0.0001. |
max.controls |
The maximum ratio of the unencouraged to the encouraged permitted within a matched set. The default is 10000. |
caliper |
Whether adding a propensity score caliper or not. The default is TRUE. |
calipersd |
The standard deviation of the logit propensity score for caliper. The default is 0.2. |
classical |
Whether using the classical estimator (proposed by Kang et al. (2016)) to estimate the complier average treatment effect. The default is FALSE. |
gamma |
The regularization threshold. The default is 0.1. |
lower.bound |
The starting value of the search region for the point estimate. |
upper.bound |
The end value of the search region for the point estimate. |
by |
The increment of the search region for the point estimate. |
alpha |
The prespecified level alpha for the 1-alpha confidence interval. |
Value
estimate |
The estimate for the complier average treatment effect using the bias-corrected Wald estimator. |
CI |
The confidence interval for the complier average treatment effect using the bias-corrected Wald estimator. |
value |
The corresponding z-scores for the hypothetical values of the complier average treatment effect. |
balance |
The pre- and post-matching covariate balance table. |
Author(s)
Jianan Zhu (maintainer), Jeffrey Zhang, Zijian Guo and Siyu Heng.
References
Fogarty, C. B. (2018). On mitigating the analytical limitations of finely stratified experiments. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(5), 1035-1056.
Hansen, B. B. (2004). Full matching in an observational study of coaching for the SAT. Journal of the American Statistical Association, 99(467), 609-618.
Hansen, B. B. and Klopfer, S. O. (2006). Optimal full matching and related designs via network flows. Journal of Computational and Graphical Statistics, 15(3), 609-627.
Kang, H., Kreuels, B., May, J., and Small, D. S. (2016). Full matching approach to instrumental variables estimation with application to the effect of malaria on stunting. The Annals of Applied Statistics, 10(1), 335-364.
Rosenbaum, P. R. (1991). A characterization of optimal designs for observational studies. Journal of the Royal Statistical Society: Series B (Methodological), 53(3), 597-610.
Zhu, J., Zhang, J., Guo, Z., and Heng, S. (2024). Randomization-Based Inference for Average Treatment Effect in Inexactly Matched Observational Studies. arXiv preprint, arXiv:2308.02005.
Examples
library(MASS)
library(xgboost)
library(optmatch)
# Generate data
set.seed(1)
d = 3
n = 30
sigma = diag(d)
# generate X
X_d = mvtnorm::rmvnorm(n, mean = rep(0,d), sigma = sigma)
# generate Z
C = -2.5
fx = 0.1*(X_d[,1])^3 + 0.3*(X_d[,2]) + 0.2*log((X_d[,3])^2) +
abs(X_d[,1]*X_d[,2]) + rnorm(n,0,1) + C
p = exp(fx)/(1+exp(fx)) # the probability of receiving the treatment
Z = rep(0,length(p))
for(i in seq_along(p)){
Z[i] = rbinom(1,1,p[i])
}
#joint distribution
matrix = matrix(c(1,0.8,0.8,1),2,2)
sigma = mvrnorm(n,c(0,0),matrix)
# generate the treatment effect D:
fx_D0 = 0.1*X_d[,3] + 0.4*sin(X_d[,2]) + 0.4*abs(X_d[,3])- 1 + sigma[,1]
ibu <- rnorm(n,0,1)
D_0 = ifelse(fx_D0>ibu,1,0)
fx_D1 = fx_D0 + 2 + 0.8*X_d[,2]^2
D_1 = ifelse(fx_D1>ibu,1,0)
D = (1-Z)*D_0 + Z*D_1
# generate continuous outcome Y:
Y_0 = 0.4*(X_d[,1])^2 + 0.1*abs(X_d[,2]) + 0.2*cos(X_d[,3]) + sigma[,2]
Y_1 = Y_0 + 1 + 0.1*X_d[,1] + 0.3*X_d[,3]^2
Y = (1-D)*Y_0 + D*Y_1
# The output
est = IPPW_IV(Y,Z,X_d,D,min.controls = 0.01, max.controls = 100,caliper=FALSE,
calipersd = 0.2, classical = FALSE,gamma = 0.1,lower.bound=0,upper.bound=3,
by=0.01,alpha=0.05)$estimate
est
The function for calculating the post-matching treatment assignment probabilities
Description
This function calculates the regularized post-matching treatment assignment probabilities, as described in Zhu, Zhang, Guo, and Heng (2024).
Usage
conditional_p(treated.subject.index,matched.control.subject.index,p,alpha=0.1)
Arguments
treated.subject.index |
The index list for treated subjects after matching. |
matched.control.subject.index |
The index list for control subjects after matching. |
p |
The estimated propensity score vector. |
alpha |
Prespecified small number as the regularization threshold. The default is 0.1. |
Value
prob |
The regularized post-matching treatment assignment probabilities vector. |
Author(s)
Jianan Zhu (maintainer), Jeffrey Zhang, Zijian Guo and Siyu Heng.
References
Zhu, J., Zhang, J., Guo, Z., and Heng, S. (2024). Randomization-Based Inference for Average Treatment Effect in Inexactly Matched Observational Studies. arXiv:2308.02005.
Examples
library(MASS)
library(xgboost)
library(optmatch)
# Generate data
set.seed(1)
d = 5
n = 50
sigma = diag(d)
# Generate X
X_d = mvtnorm::rmvnorm(n, mean = rep(0,d), sigma = sigma)
X_d[,4] = VGAM::rlaplace(n, location = 0, scale = sqrt(2)/2)
X_d[,5] = VGAM::rlaplace(n, location = 0, scale = sqrt(2)/2)
# Generate Z
C = -2.5
fx = 0.1*(X_d[,1])^3 + 0.3*(X_d[,2]) + 0.2*log((X_d[,3])^2) + 0.1*(X_d[,4]) +
0.2*X_d[,5] + abs(X_d[,1]*X_d[,2]) + (X_d[,3]*X_d[,4])^2 + 0.5*(X_d[,2]*X_d[,4])^2 +
rnorm(n,0,1) + C
p = exp(fx)/(1+exp(fx)) # the probability of receiving the treatment
Z = rep(0,length(p))
for(i in seq_along(p)){
Z[i] = rbinom(1,1,p[i])
}
# Generate Y
Y_0 = 0.2*(X_d[,1])^3 + 0.2*abs(X_d[,2]) + 0.2*X_d[,3]^3 + 0.5*abs(X_d[,4]) +
0.3*X_d[,5] + rnorm(n,0,1)
Y_1 = Y_0 + 1 + 0.3*X_d[,1] + 0.2*X_d[,3]^3
Y = (1-Z)*Y_0 + Z*Y_1
# Smahal function
smahal=
function(z,X){
X<-as.matrix(X)
n<-dim(X)[1]
rownames(X)<-1:n
k<-dim(X)[2]
m<-sum(z)
for (j in 1:k) X[,j]<-rank(X[,j])
cv<-cov(X)
vuntied<-var(1:n)
rat<-sqrt(vuntied/diag(cv))
cv<-diag(rat)%*%cv%*%diag(rat)
out<-matrix(NA,m,n-m)
Xc<-X[z==0,]
Xt<-X[z==1,]
rownames(out)<-rownames(X)[z==1]
colnames(out)<-rownames(X)[z==0]
icov<-MASS::ginv(cv)
for (i in 1:m) out[i,]<-mahalanobis(Xc,Xt[i,],icov,inverted=TRUE)
out
}
# Matching
treated.index = which(Z == 1)
propscore.model = glm(Z ~ X_d, family = 'binomial',x=TRUE,y=TRUE)
treated = propscore.model$y
Xmat=propscore.model$x[,-1]
distmat=smahal(treated,Xmat)
logit.propscore=predict(propscore.model)
subject.index=seq(1,length(treated),1)
rownames(distmat)=subject.index[treated==1]
colnames(distmat)=subject.index[treated==0]
matchvec=fullmatch(distmat,min.controls=0.0001,max.controls=10000)
treated.subject.index=vector("list",length(treated.index))
matched.control.subject.index=vector("list",length(treated.index))
matchedset.index=substr(matchvec,start=3,stop=10)
matchedset.index.numeric=as.numeric(matchedset.index)
subjects.match.order=as.numeric(names(matchvec))
matchedset_index = length(unique(matchedset.index.numeric))
# total number in each set
l <- rep(0,length(treated.subject.index))
for(i in 1:length(treated.subject.index)){
matched.set.temp=which(matchedset.index.numeric==i)
matched.set.temp.indices=subjects.match.order[matched.set.temp]
l[i] <- length(matched.set.temp.indices)
}
# the order of matchvec
for(i in 1:length(treated.index)){
matched.set.temp=which(matchedset.index.numeric==i)
matched.set.temp.indices=subjects.match.order[matched.set.temp]
treated.temp.index=which(matched.set.temp.indices %in% treated.index)
if(length(treated.temp.index) != 0){
treated.subject.index[[i]]=matched.set.temp.indices[treated.temp.index]
matched.control.subject.index[[i]]=matched.set.temp.indices[-treated.temp.index]
}
}
# remove null
if(sum(sapply(treated.subject.index, is.null)) != 0){
treated.subject.index<- treated.subject.index[-which(sapply(treated.subject.index, is.null))]
matched.control.subject.index<-matched.control.subject.index[-which(sapply(
matched.control.subject.index,is.null))]
}
# Use XGBoost to estimate propensity score
length_all = length(Z)
length_X = ncol(X_d)
df = data.frame(Z,X_d)
index_model1 = sample(length_all,length_all/2)
df1 = df[index_model1,]
df2 = df[-index_model1,]
prob = rep(0,length_all)
xgb.model1 = xgboost(data = as.matrix(df1[2:length_X]), label = df1$Z, nrounds = 2,
objective = "binary:logistic",verbose = 0)
prob[-index_model1] = predict(xgb.model1, as.matrix(df2[2:length_X]))
xgb.model2 = xgboost(data = as.matrix(df2[2:length_X]), label = df2$Z, nrounds = 2,
objective = "binary:logistic",verbose = 0)
prob[index_model1] = predict(xgb.model2, as.matrix(df1[2:length_X]))
# calculate the post-matching treatment assignment probabilities
p = conditional_p(treated.subject.index,matched.control.subject.index,prob,alpha=0.1)