Type: | Package |
Title: | Blocked Weighted Bootstrap |
Version: | 0.3.0 |
Description: | The blocked weighted bootstrap (BBW) is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. population-proportional sampling or PPS as used in Standardized Monitoring and Assessment of Relief and Transitions or SMART surveys) or posterior weighting (e.g. as used in rapid assessment method or RAM and simple spatial sampling method or S3M surveys) is implemented. See Cameron et al (2008) <doi:10.1162/rest.90.3.414> for application of bootstrap to cluster samples. See Aaron et al (2016) <doi:10.1371/journal.pone.0163176> and Aaron et al (2016) <doi:10.1371/journal.pone.0162462> for application of the blocked weighted bootstrap to estimate indicators from two-stage cluster sampled surveys. |
License: | GPL-3 |
Depends: | R (≥ 4.1.0) |
Imports: | car, cli, doParallel, foreach, methods, parallel, parallelly, stats, stringr, withr |
Suggests: | knitr, rmarkdown, testthat, spelling, covr |
Encoding: | UTF-8 |
Language: | en-GB |
LazyData: | true |
RoxygenNote: | 7.3.2 |
URL: | https://github.com/rapidsurveys/bbw, https://rapidsurveys.io/bbw/ |
BugReports: | https://github.com/rapidsurveys/bbw/issues |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-01-15 22:39:09 UTC; ernestguevarra |
Author: | Mark Myatt [aut, cph],
Ernest Guevarra |
Maintainer: | Ernest Guevarra <ernestgmd@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-01-16 09:00:06 UTC |
Blocked Weighted Bootstrap
Description
The blocked weighted bootstrap (BBW) is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. population proportional sampling or PPS as used in Standardized Monitoring and Assessment of Relief and Transitions or SMART surveys) or posterior weighting (e.g. as used in Rapid Assessment Method or RAM and Simple Spatial Sampling Method or S3M surveys).
Details
The bootstrap technique is described in this article. The BBW used in RAM and S3M is a modification to the percentile bootstrap to include blocking and weighting to account for a complex sample design.
With RAM and S3M surveys, the sample is complex in the sense that it is an unweighted cluster sample. Data analysis procedures need to account for the sample design. A blocked weighted bootstrap (BBW) can be used:
-
Blocked - The block corresponds to the primary sampling unit
PSU = cluster
. PSUs are resampled with replacement. Observations within the resampled PSUs are also sampled with replacement. -
Weighted - RAM and S3M samples do not use population proportional sampling (PPS) to weight the sample prior to data collection (e.g. as is done with SMART surveys). This means that a posterior weighting procedure is required.
{bbw}
uses a "roulette wheel" algorithm to weight (i.e. by population) the selection probability of PSUs in bootstrap replicates.
In the case of prior weighting by PPS all clusters are given the same weight. With posterior weighting (as in RAM or S3M) the weight is the population of each PSU. This procedure is very similar to the fitness proportionate selection technique used in evolutionary computing.
A total of m
PSUs are sampled with replacement for each bootstrap replicate
(where m
is the number of PSUs in the survey sample).
The required statistic is applied to each replicate. The reported estimate consists of the 0.025th (95% LCL), 0.5th (point estimate), and 0.975th (95% UCL) quantiles of the distribution of the statistic across all survey replicates.
Early versions of the {bbw}
package did not resample observations within
PSUs following:
Cameron AC, Gelbach JB, Miller DL, Bootstrap-based improvements for inference with clustered errors, Review of Economics and Statistics, 2008:90;414–427 doi:10.1162/rest.90.3.414
and used a large number (e.g. 3999) survey replicates. Current versions of
the {bbw}
package resample observations within PSUs and use a smaller
number of survey replicates (e.g. n = 400
). This is a more computationally
efficient approach
Author(s)
Maintainer: Ernest Guevarra ernestgmd@gmail.com (ORCID) [copyright holder]
Authors:
Mark Myatt mark@brixtonhealth.com [copyright holder]
See Also
Useful links:
Report bugs at https://github.com/rapidsurveys/bbw/issues
Blocked Weighted Bootstrap
Description
The blocked weighted bootstrap (BBW) is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. population proportional sampling or PPS as used in SMART surveys) or posterior weighting (e.g. as used in RAM and S3M surveys).
Usage
bootBW(x, w, statistic, params, outputColumns = params, replicates = 400)
Arguments
x |
A |
w |
A |
statistic |
Am estimator function operating on variables in |
params |
Parameters specified as names of columns in |
outputColumns |
Names to be used for columns in output |
replicates |
Number of bootstrap replicates to be performed. Default is 400. |
Value
A data.frame()
with:
number of columns equal to length of
outputColumns
;number of rows equal to number of
replicates
; and,'names equal to
outputColumns
.'
Examples
# Example call to bootBW function using RAM-OP test data:
bootBW(
x = indicatorsHH, w = villageData, statistic = bootClassic,
params = "anc1", outputColumns = "anc1", replicates = 9
)
# Example estimate with 95% CI:
#quantile(bootP, probs = c(0.500, 0.025, 0.975), na.rm = TRUE)
Simple proportion statistics function for bootstrap estimation
Description
Simple proportion statistics function for bootstrap estimation
Usage
bootClassic(x, params)
Arguments
x |
A data frame with primary sampling unit (PSU) in column named
|
params |
A vector of column names corresponding to the binary variables
of interest contained in |
Value
A numeric vector of the mean of each binary variable of interest with
length equal to length(params)
Examples
# Example call to bootClassic function
sampled_clusters <- boot_bw_sample_clusters(
x = indicatorsHH, w = boot_bw_weight(villageData)
)
boot <- boot_bw_sample_within_clusters(sampled_clusters)
bootClassic(boot, "anc1")
PROBIT statistics function for bootstrap estimation
Description
PROBIT statistics function for bootstrap estimation
Usage
bootPROBIT(x, params, threshold = THRESHOLD)
Arguments
x |
A data frame with primary sampling unit (PSU) in column named
|
params |
A vector of column names corresponding to the continuous
variables of interest contained in |
threshold |
cut-off value for continuous variable to differentiate case and non-case |
Value
A numeric vector of the PROBIT estimate of each continuous variable
of interest with length equal to length(params)
Examples
# Example call to bootBW function:
sampled_clusters <- boot_bw_sample_clusters(
x = indicatorsCH1, w = boot_bw_weight(villageData)
)
boot <- boot_bw_sample_within_clusters(sampled_clusters)
bootPROBIT(x = boot,
params = "muac1",
threshold = 115)
Blocked Weighted Bootstrap - vectorised and parallel
Description
This set of functions is an alternative to the bootBW()
function. This set
attempts to make the blocked weighted bootstrap algorithm more efficient
through vectorisation and use of parallelisation techniques. The function
syntax has been kept consistent with bootBW()
for ease of transition. A
more in depth discussion of the efficiencies gained from this alternative
function is discussed here.
Usage
boot_bw(
x,
w,
statistic,
params,
outputColumns = params,
replicates = 400,
strata = NULL,
parallel = FALSE,
cores = parallelly::availableCores(omit = 1)
)
boot_bw_parallel(
x,
w,
statistic,
params,
outputColumns = params,
replicates = 400,
strata = NULL,
cores = parallelly::availableCores(omit = 1)
)
boot_bw_sequential(
x,
w,
statistic,
params,
outputColumns = params,
replicates = 400,
strata = NULL
)
boot_bw_weight(w)
boot_bw_sample_clusters(x, w, index = FALSE)
boot_bw_sample_within_clusters(cluster_df)
Arguments
x |
A |
w |
A |
statistic |
Am estimator function operating on variables in |
params |
Parameters specified as names of columns in |
outputColumns |
Names to be used for columns in output |
replicates |
Number of bootstrap replicates to be performed. Default is 400. |
strata |
A character value for name of variable in |
parallel |
Logical. Should resampling be done in parallel? Default to FALSE. |
cores |
The number of computer cores to use or number of child processes to be run simultaneously. Default to one less than the available number of cores on current machine. |
index |
Logical. Should index values be returned or a list of
|
cluster_df |
A list of |
Value
For boot_bw()
, a data.frame()
with number of columns equal to
length of outputColumns
; number of rows equal to number of replicates
;
and, names of variables equal to values of outputColumns
. For
boot_bw_weight()
, A data.frame()
based on w
with two additional
variables for weight
and cumWeight
. For boot_bw_sample_clusters()
,
either a vector of integers corresponding to the primary sampling unit
(psu) identifier of the selected clusters (when index = TRUE
) or a list
of data.frame()
s corresponding to the data for the selected clusters
(when index = FALSE
). For boot_bw_sample_within_clusters()
, a matrix
similar in structure to x
of resampled data from each selected cluster.
Examples
boot_bw(
x = indicatorsHH, w = villageData, statistic = bootClassic,
params = "anc1", replicates = 9, parallel = TRUE
)
Estimate median and confidence intervals from bootstrap replicates
Description
Estimate median and confidence intervals from bootstrap replicates
Usage
boot_bw_estimate(boot_df)
Arguments
boot_df |
A |
Value
A data.frame()
with rows equal to the number of columns of
boot_df
and 4 columns for indicator, estimate,
95% lower confidence limit, and 95% upper confidence limit.
Examples
boot_df <- boot_bw(
x = indicatorsHH, w = villageData, statistic = bootClassic,
params = "anc1", parallel = TRUE, replicates = 9
)
boot_bw_estimate(boot_df)
Boot estimate
Description
Boot estimate
Usage
boot_percentile(boot_df)
Calculate confidence limits
Description
Calculate confidence limits
Usage
calc_total_ci(est, pop, se, ci = c("lcl", "ucl"))
Calculate total estimate
Description
Calculate total estimate
Usage
calc_total_estimate(est, pop)
Calculate total sd
Description
Calculate total sd
Usage
calc_total_sd(se, pop)
Check data
Description
Check data
Usage
check_data(x)
Check est_df
Description
Check est_df
Usage
check_est_df(est_df, strata)
Check variables
Description
Check variables
Usage
check_params(x, params)
Check pop_df
Description
Check pop_df
Usage
check_pop_df(pop_df)
Post-stratification analysis
Description
Post-stratification analysis
Usage
estimate_total(est_df, pop_df, strata)
Arguments
est_df |
A |
pop_df |
A |
strata |
A character value of the variable name in |
Value
A vector of values for the overall estimate, overall 95% lower
confidence limit, and overall 95% upper confidence limit for each of the
strata
in est_df
.
Examples
est_df <- boot_bw(
x = indicatorsHH, w = villageData, statistic = bootClassic,
params = "anc1", strata = "region", replicates = 9, parallel = TRUE
) |>
boot_bw_estimate()
## Add population ----
pop_df <- somalia_population |>
subset(select = c(region, total))
names(pop_df) <- c("strata", "pop")
estimate_total(est_df, pop_df, strata = "region")
Estimate post-stratification weighted totals
Description
Estimate post-stratification weighted totals
Usage
estimate_total_(est_pop_df)
Get levels of stratification
Description
Get levels of stratification
Usage
get_strata(x, strata)
Child Morbidity, Health Service Coverage, Anthropometry
Description
Child indicators on morbidity, health service coverage and anthropometry calculated from survey data collected in survey conducted in 4 districts from 3 regions in Somalia.
Usage
indicatorsCH1
Format
A data frame with 16 columns and 3090 rows.
Variable | Description |
region | Region in Somalia from which the cluster belongs to |
district | District in Somalia from which the cluster belongs to |
psu | The PSU identifier. This must use the same coding system used to identify the PSUs that is used in the indicators dataset |
mID | The mother identifier |
cID | The child identifier |
ch1 | Diarrhoea in the past 2 weeks (0/1) |
ch2 | Fever in the past 2 weeks (0/1) |
ch3 | Cough in the past 2 weeks (0/1) |
ch4 | Immunisation card (0/1) |
ch5 | BCG immunisation (0/1) |
ch6 | Vitamin A coverage in the past month (0/1) |
ch7 | Anti-helminth coverage in the past month (0/1) |
sex | Sex of child |
muac1 | Mid-upper arm circumference in mm |
muac2 | Mid-upper arm circumference in mm |
oedema | Oedema (0/1) |
Source
Mother and child health and nutrition survey in 3 regions of Somalia
Examples
indicatorsCH1
Infant and Child Feeding Index
Description
Infant and young child feeding indicators using the infant and child feeding index (ICFI) by Arimond and Ruel. Calculated from survey data collected in survey conducted in 4 districts from 3 regions in Somalia.
Usage
indicatorsCH2
Format
A data frame with 15 columns and 2083 rows.
Variable | Description |
region | Region in Somalia from which the cluster belongs to |
district | District in Somalia from which the cluster belongs to |
psu | The PSU identifier. This must use the same coding system used to identify the PSUs that is used in the indicators dataset |
mID | The mother identifier |
cID | The child identifier |
ebf | Exclusive breastfeeding (0/1) |
cbf | Continued breastfeeding (0/1) |
ddd | Dietary diversity (0/1) |
mfd | Meal frequency (0/1) |
icfi | Infant and child feeding index (from 0 to 6) |
iycf | Good IYCF |
icfiProp | Good ICFI |
age | Child's age |
bf | Child is breastfeeding (0/1) |
bfStop | Age in months child stopped breastfeeding |
Source
Mother and child health and nutrition survey in 3 regions of Somalia
Examples
indicatorsCH2
Mother Indicators Dataset
Description
Mother indicators for health and nutrition calculated from survey data collected in survey conducted in 4 districts from 3 regions in Somalia.
Usage
indicatorsHH
Format
A data frame with 26 columns and 2136 rows:
Variable | Description |
region | Region in Somalia from which the cluster belongs to |
district | District in Somalia from which the cluster belongs to |
psu | The PSU identifier. This must use the same coding system used to identify the PSUs that is used in the indicators dataset |
mID | The mother identifier |
mMUAC | Mothers with mid-upper arm circumference < 230 mm (0/1) |
anc1 | At least 1 antenatal care visit with a trained health professional (0/1) |
anc2 | At least 4 antenatal care visits with any service provider (0/1) |
anc3 | FeFol coverage (0/1) |
anc4 | Vitamin A coverage (0/1) |
wash1 | Improved sources of drinking water (0/1) |
wash2 | Improved sources of other water (0/1) |
wash3 | Probable safe drinking water (0/1) |
wash4 | Number of litres of water collected in a day |
wash5 | Improved toilet facilities (0/1) |
wash6 | Human waste disposal practices / behaviour (0/1) |
wash7a | Handwashing score (from 0 to 5) |
wash7b | Handwashing score of 5 (0/1) |
hhs1 | Household hunger score (from 0 to 6) |
hhs2 | Little or no hunger (0/1) |
hhs3 | Moderate hunger (0/1) |
hhs4 | Severe hunger (0/1) |
mfg | Mother's dietary diversity score |
pVitA | Plant-based vitamin A-rich foods (0/1) |
aVitA | Animal-based vitamin A-rich foods (0/1) |
xVitA | Any vitamin A-rich foods (0/1) |
iron | Iron-rich foods (0/1) |
Source
Mother and child health and nutrition survey in 3 regions of Somalia
Examples
indicatorsHH
Recode
Description
Utility function that recodes variables based on user recode specifications. Handles both numeric or factor variables.
Usage
recode(var, recodes, afr, anr = TRUE, levels)
Arguments
var |
Variable to recode |
recodes |
Character string of recode specifications:
|
afr |
Return a factor. Default is TRUE if |
anr |
Coerce result to numeric (default is TRUE) |
levels |
Order of the levels in the returned factor; the default is to use the sort order of the level names. |
Value
Recoded variable
Examples
# Recode values from 1 to 9 to various specifications
var <- sample(x = 1:9, size = 100, replace = TRUE)
# Recode single values
recode(var = var, recodes = "9=NA")
# Recode set of values
recode(var = var, recodes = "c(1,2,5)=1")
# Recode range of values
recode(var = var, recodes = "1:3=1;4:6=2;7:9=3")
# Recode other values
recode(var = var, recodes = "c(1,2,5)=1;else=NA")
Somalia regional population in 2022
Description
A data.frame with 19 rows and 18 columns:
Usage
somalia_population
Format
An object of class data.frame
with 19 rows and 18 columns.
Details
Variable | Description |
region | Region name |
total | Total population |
urban | Total urban population |
rural | Total rural population |
idp | Total IDP population |
urban_stressed | Total urban population - stressed |
rural_stressed | Total rural population - stressed |
idp_stressed | Total IDP population - stressed |
urban_crisis | Total urban population - crisis |
rural_crisis | Total rural population - crisis |
idp_crisis | Total IDP population - crisis |
urban_emergency | Total urban population - emergency |
rural_emergency | Total rural population - emergency |
idp_emergency | Total IDP population - emergency |
urban_catastrophe | Total urban population - catastrophe |
rural_catastrophe | Total rural population - catastrophe |
idp_catastrophe | Total IDP population - catastrophe |
percent_at_least_crisis | Percentage of population that are at least in crisis |
Source
https://fsnau.org/downloads/2022-Gu-IPC-Population-Tables-Current.pdf
Tidy bootstraps
Description
Tidy bootstraps
Usage
tidy_boot(boot, x, strata, outputColumns)
Cluster Population Weights Dataset
Description
Dataset containing cluster population weights for use in performing posterior weighting with the blocked weighted bootstrap approach. This dataset is from a mother and child health and nutrition survey conducted in 4 districts from 3 regions in Somalia.
Usage
villageData
Format
A data frame with 6 columns and 117 rows:
Variable | Description |
region | Region in Somalia from which the cluster belongs to |
district | District in Somalia from which the cluster belongs to |
psu | The PSU identifier. This must use the same coding system used to identify the PSUs that is used in the indicators dataset |
lon | Longitude coordinate of the cluster |
lat | Latitude coordinate of the cluster |
pop | Population size of the cluster |
Source
Mother and child health and nutrition survey in 3 regions of Somalia
Examples
villageData