fahb

library(fahb)
library(patchwork)

Introduction

Many trials fail to recruit the number of participants they need to answer their research question definitively. We generally want to identify these trials, which we call infeasible, early on so that they can be stopped and their valuable resources used elsewhere. We often do this by specifying the first stage of a study the pilot phase, or pilot trial, looking at recruitment data once the pilot has finished and then deciding if we should continue for the remainder of the trial.

Making a stop or go decision based on early recruitment data is not straightforward. We need to consider not just the number of participants recruited, but also how quickly sites are opening and how much variability there is between site recruitment rates.

Specifying the problem

The model

In the fahb package we use a hierarchical Poisson mixed model of recruitment, which means that site \(j \in \{1 , \ldots , m\}\) recruits participants at its own individual rate \(\gamma_j\), and that these rates follow a log-normal distribution: \[ \ln \gamma_j \sim N(\beta, \sigma^2). \] We also assume that sites open according to a Poisson process, with rate \(\lambda\).

The three parameters in the model \((\lambda, \beta,\) and \(\sigma)\) are unknown, and so we place prior distributions on each. Formally, \[ \begin{align} \lambda & \sim Gamma(\delta, \epsilon) ,\\ \beta & \sim N(\mu, \nu^2),\\ \sigma & \sim Gamma(\rho, \pi), \end{align} \] where \(\delta, \epsilon, \mu, \nu, \rho\) and \(\pi\) are known hyperparameters. We assume that the three parameters are independent, so that the joint prior is the product of the three component priors. The model is therefore fully specified by these six hyperparameters.

The trial

For a trial with an internal pilot, the trial design is specified by the total number to recruit in the full study, \(N\), the number of sites, \(m\), and the timing of the interim analysis, \(t\) (in years). A final piece of necessary information is the threshold we will use to denote whether a trial is feasible or not. We specify this as a multiple of the expected time to fully recruit. The trial is therefore fully specified by these four variables.

Example - GUSTO

Let’s apply this framework to specifying a particular problem. The GUSTO trial aimed to recruit a total of \(N = 320\) participants from \(m = 20\) sites over a period of 3 years, and included an internal pilot at \(t = 0.5\) years into recruitment. We suppose that if the trial recruits within rel_thr = 1.2 times the expected timeframe, we should consider it feasible; otherwise, we consider it infeasible and would ideally like to stop early.

Pre-trial feasibility assessments of recruiting sites led to an estimated total recruitment rate of 13 participants per month from all sites, and an estimated site opening rate of 10.53 sites per year. We constructed prior distributions for \(\beta, \sigma\) and \(\lambda\) which matched these expectations and allowed a realistic degree of prior uncertainty. We can then use all this information to specify the problem in the form of a fahb_problem object:

problem <- fahb_problem(N = 320, m = 20, t = 0.5, rel_thr = 1.2, 
                        so_hps = c(30, 2.85), 
                        mean_rr_hps = c(2, 0.329), 
                        sd_rr_hps = c(30, 100))

We have tried to use meaningful names for the function arguments, splitting the hyperparameters into three 2-element vectors: so_hps for the site opening rate \(\lambda\) (i.e. \(\delta\) and \(\epsilon\)); mean_rr_hps for the central parameter of the recruitment rate distribution \(\beta\) (i.e. \(\mu\) and \(\nu\)); and sd_rr_hps for the corresponding dispersion parameter \(\sigma\) (i.e. \(\rho\) and \(\pi\)).

It is generally helpful to visualise prior densities to check they align with our prior beliefs, and fahb will help you do that via the check_priors() function. In addition to densities of the three substantive parameters \(\lambda, \beta\) and \(\sigma\), it will also plot the implied prior distribution of the recruitment rate at a randomly selected site.

plots <- check_priors(problem)

(plots[[1]] + plots[[2]]) / (plots[[3]] + plots[[4]])

Operating characteristics to inform design decisions

Internal pilots are often accompanied by pre-specified thresholds on three measures of recruitment: the total number recruited, the number of sites opened, and the recruitment rate (per site, per year). These thresholds guide the progression decision, suggesting we keep going only of all three are exceeded.

These thresholds can be hard to choose, but fahb can help us by searching for thresholds which produce good operating characteristics. We propose to use the false positive rate (FPR) and the false negative rate (FNR); the probability of progressing (stopping) given that the trial is truly infeasible (feasible). We estimate these by simulating n_sims trials and comparing their true feasibility (as dictated by rel_thr) and the progression decision made by the decision rule.

An alternative form of decision rule is based on a Bayesian analysis of the pilot data and the resulting expectation of the posterior predictive distribution of the time to recruit. This rule is simpler in the sense that there is only one threshold to specify. If the expected recruitment time exceeds this then we should stop, but otherwise we can continue.

To help us choose these thresholds we can use the forecast() function to run the necessary simulations and store them in our fahb_problem object, and then create a fahb_design object which takes this problem as its argument:

set.seed(9278635)

# Run the simulations
problem <- forecast(problem)

# Find some candidate decision rules are their OCs
design <- fahb_design(problem)

plot(design)

This plot shows the best operating characteristics which are possible for both types of decision rules. We can look at the specific OCs attained and the corresponding thresholds of the decision rules by printing the fahb_design object:

design

For example, suppose we wanted a FPR of at most 0.2. The two different decision rules which minimise the FNR under that constraint are:

# Standard progression criteria:
design$Prog_Crit_OCs[design$Prog_Crit_OCs$FPR == 0.2,]

# Approximate Bayesian decision rule:
design$Bayes_OCs[design$Bayes_OCs == 0.2,]

That is, the standard progression criteria tell us we should proceed only if we recruit at least 9 participants from at least 1 site, with a overall rate of at least 4.76. We get very similar OCs if instead we run a Bayesian analysis and proceed only if the expected time to recruit is under 3.15 years. Note that many of the suggested progression criteria have negative components which will always be exceeded and are, therefore, redundant.

If we want to improve the OCs, we need more information in the internal pilot. For example, we could instead wait until \(t = 1\) year into the recruitment period:

problem <- fahb_problem(N = 320, m = 20 , t = 1, rel_thr = 1.2, 
                        so_hps = c(30, 2.85), 
                        mean_rr_hps = c(2, 0.329), 
                        sd_rr_hps = c(30, 100))

# Run the simulations
problem <- forecast(problem)

# Find some candidate decision rules are their OCs
design <- fahb_design(problem)

plot(design)
design$Prog_Crit_OCs[design$Prog_Crit_OCs$FPR == 0.2,]

Analysing the pilot data

After we have collected the pilot data, we need to analyse it and make our progression decision. If we have pre-specified standard progression criteria, this is trivial - we just calculate the three summary measures and check if they have all exceeded our chosen thresholds.

If we are making decisions using the Bayesian approach, the analysis is more involved. We need to take our recruitment data and use it to estimate the posterior distribution of the three parameters in our model (\(\lambda, \beta\) and \(\sigma\)) and of the random effects for all the sites which opened in the pilot. We then need to use these posteriors to estimate the prior predictive distribution of the time until full recruitment, \(T\).

Suppose our pilot data are recruitment numbers \(4, 8, 0\) and \(2\), recruited from sites which had been open for \(0.5, 0.4, 0.3\) and \(0.2\) years respectively.

n_pilot <- c(4, 8, 0, 2)
t_pilot <- c(0.5, 0.4, 0.3, 0.2)

analysis <- fahb_analysis(n_pilot, t_pilot, problem)

We can use print() and plot() methods to summarise results, although note that the posterior predictive samples and the brmsfit model are both held within the analysis object. This is to allow the user to summarise results in other ways if they please.

print(analysis)

plots <- plot(analysis)
(plots[[1]] + plots[[2]]) / (plots[[3]] + plots[[4]])

External pilots

By default fahb assumes an internal pilot, but we can use it for external pilots too. For example, suppose we use the same model as in the GUSTO example. Now we assume the external pilot will run from three sites until either 60 participants have been recruited or we reach a time of t_int = 0.33 of the expected recruitment time of the main trial.

problem <- fahb_problem(n_ext = 80, m_ext = 6, t_int = 0.33)

problem <- forecast(problem)

design <- fahb_design(problem)

plot(design)
design

And analysis:

n_pilot <- c(4, 8, 0, 2)
t_pilot <- c(0.5, 0.4, 0.3, 0.2)
site_t <- 0.6

analysis <- fahb_analysis(n_pilot, t_pilot, problem, site_t)

print(analysis)

plots <- plot(analysis)
(plots[[1]] + plots[[2]]) / (plots[[3]] + plots[[4]])

References