A “snake_case” filter system to R
.
if (!requireNamespace("remotes")) {
install.packages("remotes")
}::install_github(
remotesrepo = "openpharma/filters",
upgrade = "never"
)
library(filters)
library(magrittr)
library(random.cdisc.data)
library(rtables)
library(tern)
set.seed(1)
<- radsl()
adsl <- radae(adsl)
adae <- list(adsl = adsl, adae = adae) vads
{filters}
comes with a built-in filter library. You can
list them using list_all_filters()
.
list_all_filters()
# A tibble: 272 x 4
id title target condition
<chr> <chr> <chr> <chr>
1 COV Confirmed/Suspected COVID… ADAE ACOVFL == 'Y'
2 COVAS AEs Associated with COVID… ADAE ACOVASFL == 'Y'
3 CTC35 Grade 3-5 Adverse Events ADAE ATOXGR %in% c('3', '4', '5')
4 DSC Adverse Events Leading to… ADAE AEACN == 'DRUG WITHDRAWN'
5 DSM Adverse Events Leading to… ADAE AEACN %in% c('DOSE INCREASED',…
6 FATAL Fatal Adverse Events ADAE AESDTH == 'Y'
7 NCOV Excluding Confirmed/Suspe… ADAE ACOVFL != 'Y'
8 NCOVAS AEs not Associated with C… ADAE ACOVASFL != 'Y'
9 NFATAL Non-fatal Adverse Events ADAE AESDTH == 'N'
10 NREL Adverse Events not Relate… ADAE AREL == 'N'
# … with 262 more rows
To add a new filter use add_filter()
. The last argument,
condition
, defines the condition to use to filter the
datasets later on. It will be passed to subset()
when
calling apply_filter()
.
add_filter(
id = "CTC34",
title = "Grade 3-4 Adverse Events",
target = "ADAE",
condition = AETOXGR %in% c("4", "5")
)
Alternatively, you can use load_filters()
to load filter
definitions from a yaml file. The file should be structured like
this:
CTC4:
title: Grade 4 Adverse Events
target: ADAE
condition: ATOXGR == "4"
TP53WT:
title: TP53 Wild Type
target: ADSL
condition: TP53 == "WILD TYPE"
<- system.file("filters_eg.yaml", package = "filters")
file_path load_filters(file_path)
You can confirm that filters haven been successfully added by using
get_filter()
.
get_filter("CTC34")
$title
[1] "Grade 3-4 Adverse Events"
$target
[1] "ADAE"
$condition
AETOXGR %in% c("4", "5")
If you ask for a non-existing filter get_filter()
will
throw an error.
get_filter("GIDIS")
Error: Filter 'GIDIS' does not exist.
To overwrite an existing filter you will have to set
overwrite = TRUE
. Otherwise an error is thrown.
add_filter(
id = "FATAL",
title = "Fatal Adverse Events",
target = "ADAE",
condition = ATOXGR == "5"
)
Error: Filter 'FATAL' already exists. Set `overwrite = TRUE` to force overwriting the existing filter definition.
add_filter(
id = "FATAL",
title = "Fatal Adverse Events",
target = "ADAE",
condition = ATOXGR == "5",
overwrite = TRUE
)
You can use apply_filter()
to filter a single dataset or
a list
of multiple datasets.
<- apply_filter(adsl, "SE") adsl_se
Filter 'SE' matched target ADSL.
400/400 records matched the filter condition `SAFFL == 'Y'`.
<- apply_filter(adae, "CTC34_SER") adae_ctc34_ser
Filters 'CTC34', 'SER' matched target ADAE.
216/1967 records matched the filter condition `AETOXGR %in% c('4', '5') & AESER == 'Y'`.
<- apply_filter(vads, "CTC34_SER_SE") filtered_datasets
Filter 'SE' matched target ADSL.
400/400 records matched the filter condition `SAFFL == 'Y'`.
Filters 'CTC34', 'SER' matched target ADAE.
216/1967 records matched the filter condition `AETOXGR %in% c('4', '5') & AESER == 'Y'`.
As you can see apply_filter()
gives you feedback on
which IDs matched the dataset. This matching is done by the name of the
input dataset. It does not matter whether the dataset name is in upper
or lower case or a mix of both.
<- adsl
ADSL <- apply_filter(ADSL, "IT") adsl_it
Filter 'IT' matched target ADSL.
400/400 records matched the filter condition `ITTFL == 'Y'`.
In case your dataset is not named in a standard way you can manually
tell apply_filter()
which dataset it is by setting the
target
argument.
<- adsl
sl <- apply_filter(sl, "IT") sl_it1
No filter matched target SL.
<- apply_filter(sl, "IT", target = "ADSL") sl_it2
Filter 'IT' matched target ADSL.
400/400 records matched the filter condition `ITTFL == 'Y'`.
{filters}
package works well with {rtables}
and {tern}
packages. See the following example of creating
a table by a function:
<- function(datasets) {
t_ae <- merge(
anl x = datasets$adsl,
y = datasets$adae,
by = c("STUDYID", "USUBJID"),
all = FALSE, # inner join
suffixes = c("", "_ADAE")
)
<- drop_split_levels
split_fun
<- basic_table(show_colcounts = TRUE) %>%
lyt split_cols_by(var = "ARM") %>%
add_overall_col(label = "All Patients") %>%
analyze_num_patients(
vars = "USUBJID",
.stats = c("unique", "nonunique"),
.labels = c(
unique = "Total number of patients with at least one adverse event",
nonunique = "Overall total number of events"
)%>%
) split_rows_by(
"AEBODSYS",
child_labels = "visible",
nested = FALSE,
split_fun = split_fun,
label_pos = "topleft",
split_label = obj_label(adae$AEBODSYS)
%>%
) summarize_num_patients(
var = "USUBJID",
.stats = c("unique", "nonunique"),
.labels = c(
unique = "Total number of patients with at least one adverse event",
nonunique = "Total number of events"
)%>%
) count_occurrences(
vars = "AEDECOD",
.indent_mods = -1L
%>%
) append_varlabels(adae, "AEDECOD", indent = 1L)
<- build_table(
result
lyt,df = datasets$adae,
alt_counts_df = datasets$adsl
)return(result)
}
You can easily create multiple outputs with this function by applying
the filters to the input datasets before passing them to
t_ae()
.
%>% apply_filter("SE") %>% t_ae() vads
Filter 'SE' matched target ADSL.
400/400 records matched the filter condition `SAFFL == 'Y'`.
Body System or Organ Class A: Drug X B: Placebo C: Combination All Patients
Dictionary-Derived Term (N=133) (N=141) (N=126) (N=400)
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Total number of patients with at least one adverse event 111 (83.5%) 132 (93.6%) 119 (94.4%) 362 (90.5%)
Overall total number of events 636 755 655 2046
cl A.1
Total number of patients with at least one adverse event 63 (47.4%) 79 (56.0%) 71 (56.3%) 213 (53.2%)
Total number of events 123 144 133 400
dcd A.1.1.1.1 47 (35.3%) 63 (44.7%) 50 (39.7%) 160 (40.0%)
dcd A.1.1.1.2 42 (31.6%) 47 (33.3%) 44 (34.9%) 133 (33.2%)
cl B.1
Total number of patients with at least one adverse event 47 (35.3%) 49 (34.8%) 59 (46.8%) 155 (38.8%)
Total number of events 73 63 75 211
dcd B.1.1.1.1 47 (35.3%) 49 (34.8%) 59 (46.8%) 155 (38.8%)
cl B.2
Total number of patients with at least one adverse event 73 (54.9%) 88 (62.4%) 73 (57.9%) 234 (58.5%)
Total number of events 132 156 137 425
dcd B.2.1.2.1 44 (33.1%) 56 (39.7%) 50 (39.7%) 150 (37.5%)
dcd B.2.2.3.1 48 (36.1%) 59 (41.8%) 44 (34.9%) 151 (37.8%)
cl C.1
Total number of patients with at least one adverse event 50 (37.6%) 53 (37.6%) 42 (33.3%) 145 (36.2%)
Total number of events 62 75 62 199
dcd C.1.1.1.3 50 (37.6%) 53 (37.6%) 42 (33.3%) 145 (36.2%)
cl C.2
Total number of patients with at least one adverse event 50 (37.6%) 65 (46.1%) 50 (39.7%) 165 (41.2%)
Total number of events 67 87 63 217
dcd C.2.1.2.1 50 (37.6%) 65 (46.1%) 50 (39.7%) 165 (41.2%)
cl D.1
Total number of patients with at least one adverse event 74 (55.6%) 95 (67.4%) 72 (57.1%) 241 (60.2%)
Total number of events 120 158 112 390
dcd D.1.1.1.1 37 (27.8%) 59 (41.8%) 35 (27.8%) 131 (32.8%)
dcd D.1.1.4.2 54 (40.6%) 63 (44.7%) 48 (38.1%) 165 (41.2%)
cl D.2
Total number of patients with at least one adverse event 43 (32.3%) 54 (38.3%) 56 (44.4%) 153 (38.2%)
Total number of events 59 72 73 204
dcd D.2.1.5.3 43 (32.3%) 54 (38.3%) 56 (44.4%) 153 (38.2%)
%>% apply_filter("SER_SE") %>% t_ae() vads
Filter 'SE' matched target ADSL.
400/400 records matched the filter condition `SAFFL == 'Y'`.
Filter 'SER' matched target ADAE.
581/1967 records matched the filter condition `AESER == 'Y'`.
Body System or Organ Class A: Drug X B: Placebo C: Combination All Patients
Dictionary-Derived Term (N=133) (N=141) (N=126) (N=400)
—————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Total number of patients with at least one adverse event 93 (69.9%) 110 (78.0%) 98 (77.8%) 301 (75.2%)
Overall total number of events 248 280 246 774
cl A.1
Total number of patients with at least one adverse event 42 (31.6%) 47 (33.3%) 44 (34.9%) 133 (33.2%)
Total number of events 54 63 58 175
dcd A.1.1.1.2 42 (31.6%) 47 (33.3%) 44 (34.9%) 133 (33.2%)
cl B.1
Total number of patients with at least one adverse event 47 (35.3%) 49 (34.8%) 59 (46.8%) 155 (38.8%)
Total number of events 73 63 75 211
dcd B.1.1.1.1 47 (35.3%) 49 (34.8%) 59 (46.8%) 155 (38.8%)
cl B.2
Total number of patients with at least one adverse event 48 (36.1%) 59 (41.8%) 44 (34.9%) 151 (37.8%)
Total number of events 74 78 65 217
dcd B.2.2.3.1 48 (36.1%) 59 (41.8%) 44 (34.9%) 151 (37.8%)
cl D.1
Total number of patients with at least one adverse event 37 (27.8%) 59 (41.8%) 35 (27.8%) 131 (32.8%)
Total number of events 47 76 48 171
dcd D.1.1.1.1 37 (27.8%) 59 (41.8%) 35 (27.8%) 131 (32.8%)
The filters you created using add_filter()
only persist
for the duration of your R
session. That means that
whenever you restart your R
session you will have to
re-create them. The simplest way to do so is by putting all your filter
definitions inside a file filters.yml
file as described
above and call load_filters("path/to/filters.yml")
before
creating outputs.
If you pass an existing filter that does not match your target
dataset no warning or error is thrown. Instead
apply_filter()
only tells you which filters it actually
used. Thus, checking that only valid filters are passed to
apply_filter()
is up to you.
add_filter(
id = "INFCT",
title = "Infections and Infestations",
target = "ADAE",
condition = AEBODSYS == "INFECTIONS AND INFESTATIONS"
)<- apply_filter(adsl, "DIABP_IT") adsl_filtered
Filter 'IT' matched target ADSL.
400/400 records matched the filter condition `ITTFL == 'Y'`.
Internally, {filters}
stores the filter definitions
inside the .filters
environment defined in
R/zzz.R
. When you add a filter with
add_filter()
a new variable with the name of the ID is
created inside this environment. This variable is a list that stores the
title, target and condition as a quoted expression. When you use
apply_filter()
the function looks for variables in
.filters
matching the provided suffixes. It then maps the
filters to their target datasets and finally builds a call to
subset()
with the dataset as first and condition for the
filters as second argument. This call is then evaluated using
eval()
and the result is returned.