Type: Package
Title: Blocked Weighted Bootstrap
Version: 0.3.0
Description: The blocked weighted bootstrap (BBW) is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. population-proportional sampling or PPS as used in Standardized Monitoring and Assessment of Relief and Transitions or SMART surveys) or posterior weighting (e.g. as used in rapid assessment method or RAM and simple spatial sampling method or S3M surveys) is implemented. See Cameron et al (2008) <doi:10.1162/rest.90.3.414> for application of bootstrap to cluster samples. See Aaron et al (2016) <doi:10.1371/journal.pone.0163176> and Aaron et al (2016) <doi:10.1371/journal.pone.0162462> for application of the blocked weighted bootstrap to estimate indicators from two-stage cluster sampled surveys.
License: GPL-3
Depends: R (≥ 4.1.0)
Imports: car, cli, doParallel, foreach, methods, parallel, parallelly, stats, stringr, withr
Suggests: knitr, rmarkdown, testthat, spelling, covr
Encoding: UTF-8
Language: en-GB
LazyData: true
RoxygenNote: 7.3.2
URL: https://github.com/rapidsurveys/bbw, https://rapidsurveys.io/bbw/
BugReports: https://github.com/rapidsurveys/bbw/issues
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2025-01-15 22:39:09 UTC; ernestguevarra
Author: Mark Myatt [aut, cph], Ernest Guevarra ORCID iD [aut, cre, cph]
Maintainer: Ernest Guevarra <ernestgmd@gmail.com>
Repository: CRAN
Date/Publication: 2025-01-16 09:00:06 UTC

Blocked Weighted Bootstrap

Description

The blocked weighted bootstrap (BBW) is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. population proportional sampling or PPS as used in Standardized Monitoring and Assessment of Relief and Transitions or SMART surveys) or posterior weighting (e.g. as used in Rapid Assessment Method or RAM and Simple Spatial Sampling Method or S3M surveys).

Details

The bootstrap technique is described in this article. The BBW used in RAM and S3M is a modification to the percentile bootstrap to include blocking and weighting to account for a complex sample design.

With RAM and S3M surveys, the sample is complex in the sense that it is an unweighted cluster sample. Data analysis procedures need to account for the sample design. A blocked weighted bootstrap (BBW) can be used:

In the case of prior weighting by PPS all clusters are given the same weight. With posterior weighting (as in RAM or S3M) the weight is the population of each PSU. This procedure is very similar to the fitness proportionate selection technique used in evolutionary computing.

A total of m PSUs are sampled with replacement for each bootstrap replicate (where m is the number of PSUs in the survey sample).

The required statistic is applied to each replicate. The reported estimate consists of the 0.025th (95% LCL), 0.5th (point estimate), and 0.975th (95% UCL) quantiles of the distribution of the statistic across all survey replicates.

Early versions of the {bbw} package did not resample observations within PSUs following:

Cameron AC, Gelbach JB, Miller DL, Bootstrap-based improvements for inference with clustered errors, Review of Economics and Statistics, 2008:90;414–427 doi:10.1162/rest.90.3.414

and used a large number (e.g. 3999) survey replicates. Current versions of the {bbw} package resample observations within PSUs and use a smaller number of survey replicates (e.g. n = 400). This is a more computationally efficient approach

Author(s)

Maintainer: Ernest Guevarra ernestgmd@gmail.com (ORCID) [copyright holder]

Authors:

See Also

Useful links:


Blocked Weighted Bootstrap

Description

The blocked weighted bootstrap (BBW) is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. population proportional sampling or PPS as used in SMART surveys) or posterior weighting (e.g. as used in RAM and S3M surveys).

Usage

bootBW(x, w, statistic, params, outputColumns = params, replicates = 400)

Arguments

x

A data.frame() with primary sampling unit (PSU) in variable named psu and at least one other variable containing data for estimation.

w

A data.frame() with primary sampling unit (PSU) in variable named psu and survey weights (i.e. PSU population) in variable named pop.

statistic

Am estimator function operating on variables in x containing data for estimation. The functions bootClassic() and bootPROBIT() are examples.

params

Parameters specified as names of columns in x that are to be passed to the function specified in statistic.

outputColumns

Names to be used for columns in output data.frame(). Default to names specified in params.

replicates

Number of bootstrap replicates to be performed. Default is 400.

Value

A data.frame() with:

Examples

# Example call to bootBW function using RAM-OP test data:

bootBW(
  x = indicatorsHH, w = villageData, statistic = bootClassic,
  params = "anc1", outputColumns = "anc1", replicates = 9
)

# Example estimate with 95% CI:
#quantile(bootP, probs = c(0.500, 0.025, 0.975), na.rm = TRUE)


Simple proportion statistics function for bootstrap estimation

Description

Simple proportion statistics function for bootstrap estimation

Usage

bootClassic(x, params)

Arguments

x

A data frame with primary sampling unit (PSU) in column named psu and with data column/s containing the binary variable/s (0/1) of interest with column names corresponding to params values

params

A vector of column names corresponding to the binary variables of interest contained in x

Value

A numeric vector of the mean of each binary variable of interest with length equal to length(params)

Examples

# Example call to bootClassic function
sampled_clusters <- boot_bw_sample_clusters(
  x = indicatorsHH, w = boot_bw_weight(villageData)
)

boot <- boot_bw_sample_within_clusters(sampled_clusters)

bootClassic(boot, "anc1")


PROBIT statistics function for bootstrap estimation

Description

PROBIT statistics function for bootstrap estimation

Usage

bootPROBIT(x, params, threshold = THRESHOLD)

Arguments

x

A data frame with primary sampling unit (PSU) in column named psu and with data column/s containing the continuous variable/s of interest with column names corresponding to params values

params

A vector of column names corresponding to the continuous variables of interest contained in x

threshold

cut-off value for continuous variable to differentiate case and non-case

Value

A numeric vector of the PROBIT estimate of each continuous variable of interest with length equal to length(params)

Examples

# Example call to bootBW function:
sampled_clusters <- boot_bw_sample_clusters(
  x = indicatorsCH1, w = boot_bw_weight(villageData)
)

boot <- boot_bw_sample_within_clusters(sampled_clusters)

bootPROBIT(x = boot,
           params = "muac1",
           threshold = 115)


Blocked Weighted Bootstrap - vectorised and parallel

Description

This set of functions is an alternative to the bootBW() function. This set attempts to make the blocked weighted bootstrap algorithm more efficient through vectorisation and use of parallelisation techniques. The function syntax has been kept consistent with bootBW() for ease of transition. A more in depth discussion of the efficiencies gained from this alternative function is discussed here.

Usage

boot_bw(
  x,
  w,
  statistic,
  params,
  outputColumns = params,
  replicates = 400,
  strata = NULL,
  parallel = FALSE,
  cores = parallelly::availableCores(omit = 1)
)

boot_bw_parallel(
  x,
  w,
  statistic,
  params,
  outputColumns = params,
  replicates = 400,
  strata = NULL,
  cores = parallelly::availableCores(omit = 1)
)

boot_bw_sequential(
  x,
  w,
  statistic,
  params,
  outputColumns = params,
  replicates = 400,
  strata = NULL
)

boot_bw_weight(w)

boot_bw_sample_clusters(x, w, index = FALSE)

boot_bw_sample_within_clusters(cluster_df)

Arguments

x

A data.frame() with primary sampling unit (PSU) in variable named psu and at least one other variable containing data for estimation.

w

A data.frame() with primary sampling unit (PSU) in variable named psu and survey weights (i.e. PSU population) in variable named pop.

statistic

Am estimator function operating on variables in x containing data for estimation. The functions bootClassic() and bootPROBIT() are examples.

params

Parameters specified as names of columns in x that are to be passed to the function specified in statistic.

outputColumns

Names to be used for columns in output data.frame(). Default to names specified in params.

replicates

Number of bootstrap replicates to be performed. Default is 400.

strata

A character value for name of variable in x providing information on how x is grouped such that resampling is performed for each group. Default to NULL for no grouping and resampling is performed for full data.

parallel

Logical. Should resampling be done in parallel? Default to FALSE.

cores

The number of computer cores to use or number of child processes to be run simultaneously. Default to one less than the available number of cores on current machine.

index

Logical. Should index values be returned or a list of data.frame()s. Default to FALSE.

cluster_df

A list of data.frame()s for selected clusters.

Value

For boot_bw(), a data.frame() with number of columns equal to length of outputColumns; number of rows equal to number of replicates; and, names of variables equal to values of outputColumns. For boot_bw_weight(), A data.frame() based on w with two additional variables for weight and cumWeight. For boot_bw_sample_clusters(), either a vector of integers corresponding to the primary sampling unit (psu) identifier of the selected clusters (when index = TRUE) or a list of data.frame()s corresponding to the data for the selected clusters (when index = FALSE). For boot_bw_sample_within_clusters(), a matrix similar in structure to x of resampled data from each selected cluster.

Examples

boot_bw(
  x = indicatorsHH, w = villageData, statistic = bootClassic, 
  params = "anc1", replicates = 9, parallel = TRUE
)


Estimate median and confidence intervals from bootstrap replicates

Description

Estimate median and confidence intervals from bootstrap replicates

Usage

boot_bw_estimate(boot_df)

Arguments

boot_df

A data.frame() or a list of data.frame()s of bootstrap replicates with columns for each indicator to estimate. This is produced by a call to boot_bw().

Value

A data.frame() with rows equal to the number of columns of boot_df and 4 columns for indicator, estimate, 95% lower confidence limit, and 95% upper confidence limit.

Examples

boot_df <- boot_bw(
  x = indicatorsHH, w = villageData, statistic = bootClassic,
  params = "anc1", parallel = TRUE, replicates = 9
)

boot_bw_estimate(boot_df)


Boot estimate

Description

Boot estimate

Usage

boot_percentile(boot_df)

Calculate confidence limits

Description

Calculate confidence limits

Usage

calc_total_ci(est, pop, se, ci = c("lcl", "ucl"))

Calculate total estimate

Description

Calculate total estimate

Usage

calc_total_estimate(est, pop)

Calculate total sd

Description

Calculate total sd

Usage

calc_total_sd(se, pop)

Check data

Description

Check data

Usage

check_data(x)

Check est_df

Description

Check est_df

Usage

check_est_df(est_df, strata)

Check variables

Description

Check variables

Usage

check_params(x, params)

Check pop_df

Description

Check pop_df

Usage

check_pop_df(pop_df)

Post-stratification analysis

Description

Post-stratification analysis

Usage

estimate_total(est_df, pop_df, strata)

Arguments

est_df

A data.frame() of stratified indicator estimates to get overall estimates of. est_df should have a variable named est for the values of the indicator estimate, a variable named strata for information on the stratification or grouping of the estimates, and a variable named se for the standard errors for the values of the indicator estimate. This is usually produced via a call to boot_bw_estimate().

pop_df

A data.frame() with at least two variables: strata for the stratification/grouping information that matches strata in est_df and pop for information on population for the given strata.

strata

A character value of the variable name in est_df that corresponds to the strata values to match with values in pop_df

Value

A vector of values for the overall estimate, overall 95% lower confidence limit, and overall 95% upper confidence limit for each of the strata in est_df.

Examples

est_df <- boot_bw(
  x = indicatorsHH, w = villageData, statistic = bootClassic, 
  params = "anc1", strata = "region", replicates = 9, parallel = TRUE
) |>
  boot_bw_estimate()

## Add population ----
pop_df <- somalia_population |>
  subset(select = c(region, total))

names(pop_df) <- c("strata", "pop")

estimate_total(est_df, pop_df, strata = "region")


Estimate post-stratification weighted totals

Description

Estimate post-stratification weighted totals

Usage

estimate_total_(est_pop_df)

Get levels of stratification

Description

Get levels of stratification

Usage

get_strata(x, strata)

Child Morbidity, Health Service Coverage, Anthropometry

Description

Child indicators on morbidity, health service coverage and anthropometry calculated from survey data collected in survey conducted in 4 districts from 3 regions in Somalia.

Usage

indicatorsCH1

Format

A data frame with 16 columns and 3090 rows.

Variable Description
region Region in Somalia from which the cluster belongs to
district District in Somalia from which the cluster belongs to
psu The PSU identifier. This must use the same coding system used to identify the PSUs that is used in the indicators dataset
mID The mother identifier
cID The child identifier
ch1 Diarrhoea in the past 2 weeks (0/1)
ch2 Fever in the past 2 weeks (0/1)
ch3 Cough in the past 2 weeks (0/1)
ch4 Immunisation card (0/1)
ch5 BCG immunisation (0/1)
ch6 Vitamin A coverage in the past month (0/1)
ch7 Anti-helminth coverage in the past month (0/1)
sex Sex of child
muac1 Mid-upper arm circumference in mm
muac2 Mid-upper arm circumference in mm
oedema Oedema (0/1)

Source

Mother and child health and nutrition survey in 3 regions of Somalia

Examples

indicatorsCH1


Infant and Child Feeding Index

Description

Infant and young child feeding indicators using the infant and child feeding index (ICFI) by Arimond and Ruel. Calculated from survey data collected in survey conducted in 4 districts from 3 regions in Somalia.

Usage

indicatorsCH2

Format

A data frame with 15 columns and 2083 rows.

Variable Description
region Region in Somalia from which the cluster belongs to
district District in Somalia from which the cluster belongs to
psu The PSU identifier. This must use the same coding system used to identify the PSUs that is used in the indicators dataset
mID The mother identifier
cID The child identifier
ebf Exclusive breastfeeding (0/1)
cbf Continued breastfeeding (0/1)
ddd Dietary diversity (0/1)
mfd Meal frequency (0/1)
icfi Infant and child feeding index (from 0 to 6)
iycf Good IYCF
icfiProp Good ICFI
age Child's age
bf Child is breastfeeding (0/1)
bfStop Age in months child stopped breastfeeding

Source

Mother and child health and nutrition survey in 3 regions of Somalia

Examples

indicatorsCH2


Mother Indicators Dataset

Description

Mother indicators for health and nutrition calculated from survey data collected in survey conducted in 4 districts from 3 regions in Somalia.

Usage

indicatorsHH

Format

A data frame with 26 columns and 2136 rows:

Variable Description
region Region in Somalia from which the cluster belongs to
district District in Somalia from which the cluster belongs to
psu The PSU identifier. This must use the same coding system used to identify the PSUs that is used in the indicators dataset
mID The mother identifier
mMUAC Mothers with mid-upper arm circumference < 230 mm (0/1)
anc1 At least 1 antenatal care visit with a trained health professional (0/1)
anc2 At least 4 antenatal care visits with any service provider (0/1)
anc3 FeFol coverage (0/1)
anc4 Vitamin A coverage (0/1)
wash1 Improved sources of drinking water (0/1)
wash2 Improved sources of other water (0/1)
wash3 Probable safe drinking water (0/1)
wash4 Number of litres of water collected in a day
wash5 Improved toilet facilities (0/1)
wash6 Human waste disposal practices / behaviour (0/1)
wash7a Handwashing score (from 0 to 5)
wash7b Handwashing score of 5 (0/1)
hhs1 Household hunger score (from 0 to 6)
hhs2 Little or no hunger (0/1)
hhs3 Moderate hunger (0/1)
hhs4 Severe hunger (0/1)
mfg Mother's dietary diversity score
pVitA Plant-based vitamin A-rich foods (0/1)
aVitA Animal-based vitamin A-rich foods (0/1)
xVitA Any vitamin A-rich foods (0/1)
iron Iron-rich foods (0/1)

Source

Mother and child health and nutrition survey in 3 regions of Somalia

Examples

indicatorsHH


Recode

Description

Utility function that recodes variables based on user recode specifications. Handles both numeric or factor variables.

Usage

recode(var, recodes, afr, anr = TRUE, levels)

Arguments

var

Variable to recode

recodes

Character string of recode specifications:

  • Recode specifications in a character string separated by semicolons of the form input=output as in: "1=1;2=1;3:6=2;else=NA"

    \item If an input value satisfies more than one specification, then the
    first (reading from left to right) is applied
    
    \item If no specification is satisfied, then the input value is carried
    over to the result unchanged
    
    \item \code{NA} is allowed on both input and output
    
    \item The following recode specifications are supported:
    
        \tabular{lll}{
          \strong{Specification} \tab \strong{Example}          \tab \strong{Notes}                                                 \cr
          Single values          \tab \code{9=NA}               \tab                                                                \cr
          Set of values          \tab \code{c(1,2,5)=1}         \tab The left-hand-side is any R function call that returns a vector\cr
                                 \tab \code{seq(1,9,2)='odd'}   \tab                                                                \cr
                                 \tab \code{1:10=1}             \tab                                                                \cr
          Range of values        \tab \code{7:9=3}              \tab Special values \code{lo} and \code{hi} may be used             \cr
                                 \tab \code{lo:115=1}           \tab                                                                \cr
          Other values           \tab \code{else=NA}            \tab
        }
    
    \item Character values are quoted as in :
    
         \code{recodes = "c(1,2,5)='sanitary' else='unsanitary'"}
    
    \item The output may be the (scalar) result of a function call as in:
    
         \code{recodes = "999=median(var, na.rm = TRUE)"}
    
    \item Users are advised to carefully check the results of \code{recode()} calls
    with any outputs that are the results of a function call.
    
    \item The output may be the (scalar) value of a variable as in:
    
         \code{recodes = "999=scalarVariable"}
    
    \item If all of the output values are numeric, and if \code{'afr'} is \code{FALSE},
    then a numeric result is returned; if \code{var} is a factor then
    (by default) so is the result.
    
afr

Return a factor. Default is TRUE if var is a factor and is FALSE otherwise

anr

Coerce result to numeric (default is TRUE)

levels

Order of the levels in the returned factor; the default is to use the sort order of the level names.

Value

Recoded variable

Examples

# Recode values from 1 to 9 to various specifications
var <- sample(x = 1:9, size = 100, replace = TRUE)

# Recode single values
recode(var = var, recodes = "9=NA")

# Recode set of values
recode(var = var, recodes = "c(1,2,5)=1")

# Recode range of values
recode(var = var, recodes = "1:3=1;4:6=2;7:9=3")

# Recode other values
recode(var = var, recodes = "c(1,2,5)=1;else=NA")


Somalia regional population in 2022

Description

A data.frame with 19 rows and 18 columns:

Usage

somalia_population

Format

An object of class data.frame with 19 rows and 18 columns.

Details

Variable Description
region Region name
total Total population
urban Total urban population
rural Total rural population
idp Total IDP population
urban_stressed Total urban population - stressed
rural_stressed Total rural population - stressed
idp_stressed Total IDP population - stressed
urban_crisis Total urban population - crisis
rural_crisis Total rural population - crisis
idp_crisis Total IDP population - crisis
urban_emergency Total urban population - emergency
rural_emergency Total rural population - emergency
idp_emergency Total IDP population - emergency
urban_catastrophe Total urban population - catastrophe
rural_catastrophe Total rural population - catastrophe
idp_catastrophe Total IDP population - catastrophe
percent_at_least_crisis Percentage of population that are at least in crisis

Source

https://fsnau.org/downloads/2022-Gu-IPC-Population-Tables-Current.pdf


Tidy bootstraps

Description

Tidy bootstraps

Usage

tidy_boot(boot, x, strata, outputColumns)

Cluster Population Weights Dataset

Description

Dataset containing cluster population weights for use in performing posterior weighting with the blocked weighted bootstrap approach. This dataset is from a mother and child health and nutrition survey conducted in 4 districts from 3 regions in Somalia.

Usage

villageData

Format

A data frame with 6 columns and 117 rows:

Variable Description
region Region in Somalia from which the cluster belongs to
district District in Somalia from which the cluster belongs to
psu The PSU identifier. This must use the same coding system used to identify the PSUs that is used in the indicators dataset
lon Longitude coordinate of the cluster
lat Latitude coordinate of the cluster
pop Population size of the cluster

Source

Mother and child health and nutrition survey in 3 regions of Somalia

Examples

villageData