Type: | Package |
Title: | Approximate the Variance of the Horvitz-Thompson Total Estimator |
Version: | 0.1.4 |
Date: | 2023-08-26 |
Description: | Variance approximations for the Horvitz-Thompson total estimator in Unequal Probability Sampling using only first-order inclusion probabilities. See Matei and Tillé (2005) and Haziza, Mecatti and Rao (2008) for details. |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
BugReports: | https://github.com/rhobis/UPSvarApprox/issues |
NeedsCompilation: | no |
Packaged: | 2023-08-26 08:22:20 UTC; Roberto |
Author: | Roberto Sichera [aut, cre] |
Maintainer: | Roberto Sichera <rob.sichera@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2023-08-27 07:00:07 UTC |
UPSvarApprox: Approximate the variance of the Horvitz-Thompson estimator
Description
Variance approximations for the Horvitz-Thompson total estimator in Unequal Probability Sampling using only first-order inclusion probabilities. See Matei and Tillé (2005) and Haziza, Mecatti and Rao (2008) for details.
Variance approximation
The package provides function Var_approx
for the approximation of the
Horvitz-Thompson variance, and function approx_var_est
for the computation
of approximate variance estimators.
For both functions, different estimators are implemented,
see their documentation for details.
Author(s)
Maintainer: Roberto Sichera rob.sichera@gmail.com
References
Matei, A.; Tillé, Y., 2005. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics 21 (4), 543-570.
Haziza, D.; Mecatti, F.; Rao, J.N.K. 2008. Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron LXVI (1), 91-108.
See Also
Useful links:
Report bugs at https://github.com/rhobis/UPSvarApprox/issues
Approximate the Variance of the Horvitz-Thompson estimator
Description
Approximations of the Horvitz-Thompson variance for High-Entropy sampling designs. Such methods use only first-order inclusion probabilities.
Usage
Var_approx(y, pik, n, method, ...)
Arguments
y |
numeric vector containing the values of the variable of interest for all population units |
pik |
numeric vector of first-order inclusion probabilities, of length equal to population size |
n |
a scalar indicating the sample size |
method |
string indicating the approximation that should be used. One of "Hajek1", "Hajek2", "HartleyRao1", "HartleyRao2", "FixedPoint". |
... |
two optional parameters can be modified to control the iterative
procedure in |
Details
The variance approximations available in this function are described below, the notation used is that of Matei and Tillé (2005).
Hájek variance approximation (
method="Hajek1"
):\tilde{Var} = \sum_{i \in U} \frac{b_i}{\pi_i^2}(y_i - y_i^*)^2
where
y_i^* = \pi_i \frac{ \sum_{j\in U} b_j y_j/\pi_j }{ \sum_{j \in U} b_j }
and
b_i = \frac{ \pi_i(1-\pi_i)N }{ N-1 }
Starting from Hajék (1964), Brewer (2002) defined the following estimator (
method="Hajek2"
):\tilde{Var} = \sum_{i \in U} \pi_i(1-\pi_i) \Bigl( \frac{y_i}{\pi_i} - \frac{\tilde{Y}}{n} \Bigr)^2
where
\tilde{Y} = \sum_{i \in U} a_i y_i
anda_i = n(1-\pi_i)/\sum_{j \in U} \pi_j(1-\pi_j)
Hartley and Rao (1962) variance approximation (
method="HartleyRao1"
):\tilde{Var} = \sum_{i \in U} \pi_i \Bigl( 1 - \frac{n-1}{n}\pi_i \Bigr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2
\qquad - \frac{n-1}{n^2} \sum_{i \in U} \Biggl( 2\pi_i^3 - \frac{\pi_i^2}{2} \sum_{j \in U} \pi_j^2 \Biggr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2
\quad \qquad + \frac{2(n-1)}{n^3} \Biggl( \sum_{i \in U}\pi_i y_i - \frac{Y}{n}\sum_{i\in U} \pi_i^2 \Biggr)^2
Hartley and Rao (1962) provide a simplified version of the variance above (
method="HartleyRao2"
):\tilde{Var} = \sum_{i \in U} \pi_i \Bigl( 1 - \frac{n-1}{n}\pi_i \Bigr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2
-
method="FixedPoint"
computes the Fixed-Point variance approximation proposed by Deville and Tillé (2005). The variance can be expressed in the same form as inmethod="Hajek1"
, and the coefficientsb_i
are computed iteratively by the algorithm:-
b_i^{(0)} = \pi_i (1-\pi_i) \frac{N}{N-1}, \,\, \forall i \in U
-
b_i^{(k)} = \frac{(b_i^{(k-1)})^2 }{\sum_{j\in U} b_j^{(k-1)} } + \pi_i(1-\pi_i)
a necessary condition for convergence is checked and, if not satisfied, the function returns an alternative solution that uses only one iteration:
b_i = \pi_i(1-\pi_i)\Biggl( \frac{N\pi_i(1-\pi_i)}{ (N-1)\sum_{j\in U}\pi_j(1-\pi_j) } + 1 \Biggr)
-
Value
a scalar, the approximated variance.
References
Matei, A.; Tillé, Y., 2005. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics 21 (4), 543-570.
Examples
N <- 500; n <- 50
set.seed(0)
x <- rgamma(n=N, scale=10, shape=5)
y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) )
pik <- n * x/sum(x)
pikl <- outer(pik, pik, '*'); diag(pikl) <- pik
### Variance approximations ---
Var_approx(y, pik, n, method = "Hajek1")
Var_approx(y, pik, n, method = "Hajek2")
Var_approx(y, pik, n, method = "HartleyRao1")
Var_approx(y, pik, n, method = "HartleyRao2")
Var_approx(y, pik, n, method = "FixedPoint")
Approximated Variance Estimators
Description
Approximated variance estimators which use only first-order inclusion probabilities
Usage
approx_var_est(y, pik, method, sample = NULL, ...)
Arguments
y |
numeric vector of sample observations |
pik |
numeric vector of first-order inclusion probabilities of length N, the population size, or n, the sample size depending on the chosen method (see Details for more information) |
method |
string indicating the desired approximate variance estimator. One of "Deville1", "Deville2", "Deville3", "Hajek", "Rosen", "FixedPoint", "Brewer1", "HartleyRao", "Berger", "Tille", "MateiTille1", "MateiTille2", "MateiTille3", "MateiTille4", "MateiTille5", "Brewer2", "Brewer3", "Brewer4". |
sample |
Either a numeric vector of length equal to the sample size with
the indices of sample units, or a boolean vector of the same length of |
... |
two optional parameters can be modified to control the iterative
procedures in methods |
Details
The choice of the estimator to be used is made through the argument method
,
the list of methods and their respective equations is presented below.
Matei and Tillé (2005) divides the approximated variance estimators into three classes, depending on the quantities they require:
First and second-order inclusion probabilities: The first class is composed of the Horvitz-Thompson estimator (Horvitz and Thompson 1952) and the Sen-Yates-Grundy estimator (Yates and Grundy 1953; Sen 1953), which are available through function
varHT
in packagesampling
;Only first-order inclusion probabilities and only for sample units;
Only first-order inclusion probabilities, for the entire population.
Haziza, Mecatti and Rao (2008) provide a common form to express most of the estimators in class 2 and 3:
\widehat{var}(\hat{t}_{HT}) = \sum_{i \in s}c_i e_i^2
where e_i = \frac{y_i}{\pi_i} - \hat{B}
, with
\hat{B} = \frac{\sum_{i\in s} a_i (y_i/\pi_i) }{\sum_{i\in s} a_i}
and a_i
and c_i
are parameters that define the different
estimators:
-
method="Hajek"
[Class 2]c_i = \frac{n}{n-1}(1-\pi_i) ; \quad a_i= c_i
-
method="Deville2"
[Class 2]c_i = (1-\pi_i)\Biggl\{ 1 - \sum_{j\in s}\Bigl[ \frac{1-\pi_j}{\sum_{k\in s} (1-\pi_k)} \Bigr]^2 \Biggr\}^{-1} ; \quad a_i= c_i
-
method="Deville3"
[Class 2]c_i = (1-\pi_i)\Biggl\{ 1 - \sum_{j\in s}\Bigl[ \frac{1-\pi_j}{\sum_{k\in s} (1-\pi_k)} \Bigr]^2 \Biggr\}^{-1}; \quad a_i= 1
-
method="Rosen"
[Class 2]c_i = \frac{n}{n-1} (1-\pi_i); \quad a_i= (1-\pi_i)log(1-\pi_i) / \pi_i
-
method="Brewer1"
[Class 2]c_i = \frac{n}{n-1}(1-\pi_i); \quad a_i= 1
-
method="Brewer2"
[Class 3]c_i = \frac{n}{n-1} \Bigl(1-\pi_i+ \frac{\pi_i}{n} - n^{-2}\sum_{j \in U} \pi_j^2 \Bigr) ; \quad a_i=1
-
method="Brewer3"
[Class 3]c_i = \frac{n}{n-1} \Bigl(1-\pi_i - \frac{\pi_i}{n} - n^{-2}\sum_{j \in U} \pi_j^2 \Bigr); \quad a_i = 1
-
method="Brewer4"
[Class 3]c_i = \frac{n}{n-1} \Bigl(1-\pi_i - \frac{\pi_i}{n-1} + n^{-1}(n-1)^{-1}\sum_{j \in U} \pi_j^2 \Bigr); \quad a_i=1
-
method="Berger"
[Class 3]c_i = \frac{n}{n-1} (1-\pi_i) \Biggl[ \frac{\sum_{j\in s} (1-\pi_j)}{\sum_{j\in U} (1-\pi_j)} \Biggr] ; \quad a_i=c_i
-
method="HartleyRao"
[Class 3]c_i = \frac{n}{n-1} \Bigl(1-\pi_i - n^{-1}\sum_{j \in s}\pi_i + n^{-1}\sum_{j\in U} \pi_j^2 \Bigr) ; \quad a_i=1
Some additional estimators are defined in Matei and Tillé (2005):
-
method="Deville1"
[Class 2]\widehat{var}(\hat{t}_{HT}) = \sum_{i \in s} \frac{c_i}{ \pi_i^2} (y_i - y_i^*)^2
where
y_i^* = \pi_i \frac{\sum_{j \in s} c_j y_j / \pi_j}{\sum_{j \in s} c_j}
and
c_i = (1-\pi_i)\frac{n}{n-1}
-
method="Tille"
[Class 3]\widehat{var}(\hat{t}_{HT}) = \biggl( \sum_{i \in s} \omega_i \biggr) \sum_{i\in s} \omega_i (\tilde{y}_i - \bar{\tilde{y}}_\omega )^2 - n \sum_{i\in s}\biggl( \tilde{y}_i - \frac{\hat{t}_{HT}}{n} \biggr)^2
where
\tilde{y}_i = y_i / \pi_i
,\omega_i = \pi_i / \beta_i
and\bar{\tilde{y}}_\omega = \biggl( \sum_{i \in s} \omega_i \biggr)^{-1} \sum_{i \in s} \omega_i \tilde{y}_i
The coefficients
\beta_i
are computed iteratively through the following procedure:-
\beta_i^{(0)} = \pi_i, \,\, \forall i\in U
-
\beta_i^{(2k-1)} = \frac{(n-1)\pi_i}{\beta^{(2k-2)} - \beta_i^{(2k-2)}}
-
\beta_i^{2k} = \beta_i^{(2k-1)} \Biggl( \frac{n(n-1)}{(\beta^(2k-1))^2 - \sum_{i\in U} (\beta_k^{(2k-1)})^2 } \Biggr)^{(1/2)}
with
\beta^{(k)} = \sum_{i\in U} \beta_i^{i}, \,\, k=1,2,3, \dots
-
-
method="MateiTille1"
[Class 3]\widehat{var}(\hat{t}_{HT}) = \frac{n(N-1)}{N(n-1)} \sum_{i\in s} \frac{b_i}{\pi_i^3} (y_i - \hat{y}_i^*)^2
where
\hat{y}_i^* = \pi_i \frac{\sum_{i\in s} b_i y_i/\pi_i^2}{\sum_{i\in s} b_i/\pi_i}
and the coefficients
b_i
are computed iteratively by the algorithm:-
b_i^{(0)} = \pi_i (1-\pi_i) \frac{N}{N-1}, \,\, \forall i \in U
-
b_i^{(k)} = \frac{(b_i^{(k-1)})^2 }{\sum_{j\in U} b_j^{(k-1)} } + \pi_i(1-\pi_i)
a necessary condition for convergence is checked and, if not satisfied, the function returns an alternative solution that uses only one iteration:
b_i = \pi_i(1-\pi_i)\Biggl( \frac{N\pi_i(1-\pi_i)}{ (N-1)\sum_{j\in U}\pi_j(1-\pi_j) } + 1 \Biggr)
-
-
method="MateiTille2"
[Class 3]\widehat{var}(\hat{t}_{HT}) = \frac{1}{1 - \sum_{i\in U} \frac{d_i^2}{\pi_i} } \sum_{i\in s} (1-\pi_i) \Biggl( \frac{y_i}{\pi_i} - \frac{\hat{t}_{HT}}{n} \Biggr)^2
where
d_i = \frac{\pi_i(1-\pi_i)}{\sum_{j\in U} \pi_j(1-\pi_j) }
-
method="MateiTille3"
[Class 3]\widehat{var}(\hat{t}_{HT}) = \frac{1}{1 - \sum_{i\in U} \frac{d_i^2}{\pi_i} } \sum_{i\in s} (1-\pi_i) \Biggl( \frac{y_i}{\pi_i} - \frac{ \sum_{j\in s} (1-\pi_j)\frac{y_j}{\pi_j} }{ \sum_{j\in s} (1-\pi_j) } \Biggr)^2
where
d_i
is defined as inmethod="MateiTille2"
. -
method="MateiTille4"
[Class 3]\widehat{var}(\hat{t}_{HT}) = \frac{1}{1 - \sum_{i\in U} b_i/n^2 } \sum_{i\in s} \frac{b_i}{\pi_i^3} (y_i - y_i^* )^2
where
y_i^* = \pi_i \frac{ \sum_{j\in s} b_j y_j/\pi_j^2 }{ \sum_{j\in s} b_j/\pi_j }
and
b_i = \frac{ \pi_i(1-\pi_i)N }{ N-1 }
-
method="MateiTille5"
[Class 3] This estimator is defined as inmethod="MateiTille4"
, and theb_i
values are defined as inmethod="MateiTille1"
Value
a scalar, the estimated variance
References
Matei, A.; Tillé, Y., 2005. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics 21 (4), 543-570.
Haziza, D.; Mecatti, F.; Rao, J.N.K. 2008. Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron LXVI (1), 91-108.
Examples
### Generate population data ---
N <- 500; n <- 50
set.seed(0)
x <- rgamma(500, scale=10, shape=5)
y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) )
pik <- n * x/sum(x)
s <- sample(N, n)
ys <- y[s]
piks <- pik[s]
### Estimators of class 2 ---
approx_var_est(ys, piks, method="Deville1")
approx_var_est(ys, piks, method="Deville2")
approx_var_est(ys, piks, method="Deville3")
approx_var_est(ys, piks, method="Hajek")
approx_var_est(ys, piks, method="Rosen")
approx_var_est(ys, piks, method="FixedPoint")
approx_var_est(ys, piks, method="Brewer1")
### Estimators of class 3 ---
approx_var_est(ys, pik, method="HartleyRao", sample=s)
approx_var_est(ys, pik, method="Berger", sample=s)
approx_var_est(ys, pik, method="Tille", sample=s)
approx_var_est(ys, pik, method="MateiTille1", sample=s)
approx_var_est(ys, pik, method="MateiTille2", sample=s)
approx_var_est(ys, pik, method="MateiTille3", sample=s)
approx_var_est(ys, pik, method="MateiTille4", sample=s)
approx_var_est(ys, pik, method="MateiTille5", sample=s)
approx_var_est(ys, pik, method="Brewer2", sample=s)
approx_var_est(ys, pik, method="Brewer3", sample=s)
approx_var_est(ys, pik, method="Brewer4", sample=s)
Check if a number is integer
Description
Check if x
is an integer number, differently from is.integer
,
which checks the type of the object x
Usage
is.wholenumber(x, tol = .Machine$double.eps^0.5)
Arguments
x |
a scalar or a numeric vector |
tol |
a scalar, indicating the tolerance |
Note
From the help page of function is.integer
Berger approximate variance estimator
Description
Compute the Berger approximate variance estimator (Berger, 1998) Estimator of class 3, it requires only first-order inclusion probabilities but for all population units.
Usage
var_Berger(y, pik, sample)
Arguments
y |
numeric vector of sample observations |
pik |
numeric vector of first-order inclusion probabilities for all population units |
sample |
Either a numeric vector of length equal to the sample size with
the indices of sample units, or a boolean vector of the same length of |
Value
a scalar, the estimated variance
Approximate Variance Estimators by Brewer (2002)
Description
Computes an approximate variance estimate according to one of the estimators proposed by Brewer (2002). This estimator belongs to class 2, it requires only first-order inclusion probabilities and only for sample units.
Usage
var_Brewer_class2(y, pik)
Arguments
y |
numeric vector of sample observations |
pik |
numeric vector of first-order inclusion probabilities for sample units |
Value
a scalar, the estimated variance
Approximate Variance Estimators by Brewer (2002)
Description
Compute an approximate variance estimate using the class of estimators proposed by Brewer (2002). Estimators of class 3, they require only first-order inclusion probabilities but for all population units.
Usage
var_Brewer_class3(y, pik, method, sample)
Arguments
y |
numeric vector of sample observations |
pik |
numeric vector of first-order inclusion probabilities for all population units |
method |
string indicating the desired approximate variance estimator. One of "Deville1", "Deville2", "Deville3", "Hajek", "Rosen", "FixedPoint", "Brewer1", "HartleyRao", "Berger", "Tille", "MateiTille1", "MateiTille2", "MateiTille3", "MateiTille4", "MateiTille5", "Brewer2", "Brewer3", "Brewer4". |
sample |
Either a numeric vector of length equal to the sample size with
the indices of sample units, or a boolean vector of the same length of |
Value
a scalar, the estimated variance
Deville's approximate variance estimators
Description
Compute Deville's approximate variance estimators of class 2, which require only first-order inclusion probabilities and only for sample units
Usage
var_Deville(y, pik, method)
Arguments
y |
numeric vector of sample observations |
pik |
numeric vector of first-order inclusion probabilities for sample units |
method |
string indicating the desired approximate variance estimator. One of "Deville1", "Deville2", "Deville3", "Hajek", "Rosen", "FixedPoint", "Brewer1", "HartleyRao", "Berger", "Tille", "MateiTille1", "MateiTille2", "MateiTille3", "MateiTille4", "MateiTille5", "Brewer2", "Brewer3", "Brewer4". |
Value
a scalar, the estimated variance
Fixed-Point approximate variance estimator
Description
Fixed-Point estimator for approximate variance estimation by Deville and Tillé (2005). Estimator of class 2, it requires only first-order inclusion probabilities and only for sample units.
Usage
var_FixedPoint(y, pik, maxIter = 1000, eps = 1e-05)
Arguments
y |
numeric vector of sample observations |
pik |
numeric vector of first-order inclusion probabilities for sample units |
maxIter |
a scalar indicating the maximum number of iterations for the fixed-point procedure |
eps |
tolerance value for the convergence of the fixed-point procedure |
Value
a scalar, the estimated variance
Hájek Approximate Variance Estimator
Description
Compute an approximate variance estimate using the approximation of joint-inclusion probabilities proposed by Hájek (1964). Estimator of class 2, it requires only first-order inclusion probabilities and only for sample units.
Usage
var_Hajek(y, pik)
Arguments
y |
numeric vector of sample observations |
pik |
numeric vector of first-order inclusion probabilities for sample units |
Value
a scalar, the estimated variance
Hartley and Rao approximate variance estimator
Description
Compute the an approximate variance estimator obtained by the approximation of joint-inclusion probabilities proposed by Hartley and Rao (1962). Estimator of class 3, it requires only first-order inclusion probabilities but for all population units.
Usage
var_HartleyRao(y, pik, sample)
Arguments
y |
numeric vector of sample observations |
pik |
numeric vector of first-order inclusion probabilities for all population units |
sample |
Either a numeric vector of length equal to the sample size with
the indices of sample units, or a boolean vector of the same length of |
Value
a scalar, the estimated variance
Approximate Variance Estimators by Matei and Tillé (2005)
Description
Computes an approximate variance estimate according to one of the estimators proposed by Matei and Tillé (2005) Estimators of class 3, they require only first-order inclusion probabilities but for all population units.
Usage
var_MateiTille(y, pik, method, sample, maxIter = 1000, eps = 1e-05)
Arguments
y |
numeric vector of sample observations |
pik |
numeric vector of first-order inclusion probabilities for all population units |
method |
string indicating the desired approximate variance estimator. One of "Deville1", "Deville2", "Deville3", "Hajek", "Rosen", "FixedPoint", "Brewer1", "HartleyRao", "Berger", "Tille", "MateiTille1", "MateiTille2", "MateiTille3", "MateiTille4", "MateiTille5", "Brewer2", "Brewer3", "Brewer4". |
sample |
Either a numeric vector of length equal to the sample size with
the indices of sample units, or a boolean vector of the same length of |
maxIter |
a scalar indicating the maximum number of iterations for the fixed-point procedure |
eps |
tolerance value for the convergence of the fixed-point procedure |
Value
a scalar, the estimated variance
Rosén Approximate Variance Estimator
Description
Compute an approximate variance estimate using the estimator proposed by Rosén (1991). Estimator of class 2, it requires only first-order inclusion probabilities and only for sample units.
Usage
var_Rosen(y, pik)
Arguments
y |
numeric vector of sample observations |
pik |
numeric vector of first-order inclusion probabilities for sample units |
Value
a scalar, the estimated variance
Tillé's Approximate Variance Estimator
Description
Compute an approximate variance estimate using the estimator proposed by Tillé (1996). Estimator of class 3, it requires only first-order inclusion probabilities but for all population units.
Usage
var_Tille(y, pik, sample, maxIter = 1000, eps = 1e-05)
Arguments
y |
numeric vector of sample observations |
pik |
numeric vector of first-order inclusion probabilities for all population units |
sample |
Either a numeric vector of length equal to the sample size with
the indices of sample units, or a boolean vector of the same length of |
maxIter |
a scalar indicating the maximum number of iterations for the fixed-point procedure |
eps |
tolerance value for the convergence of the fixed-point procedure |
Value
a scalar, the estimated variance