Title: | High Precision Timing of R Expressions |
Version: | 1.1.4 |
Description: | Tools to accurately benchmark and analyze execution times for R expressions. |
License: | MIT + file LICENSE |
URL: | https://bench.r-lib.org/, https://github.com/r-lib/bench |
BugReports: | https://github.com/r-lib/bench/issues |
Depends: | R (≥ 4.0.0) |
Imports: | glue (≥ 1.8.0), methods, pillar (≥ 1.10.1), profmem (≥ 0.6.0), rlang (≥ 1.1.4), stats, tibble (≥ 3.2.1), utils |
Suggests: | covr, dplyr, forcats, ggbeeswarm, ggplot2 (≥ 3.5.1), ggridges, parallel, scales, testthat (≥ 3.2.3), tidyr (≥ 1.3.1), vctrs (≥ 0.6.5), withr |
Config/Needs/website: | tidyverse/tidytemplate |
Config/testthat/edition: | 3 |
Config/usethis/last-upkeep: | 2025-01-16 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | yes |
Packaged: | 2025-01-16 22:07:42 UTC; davis |
Author: | Jim Hester [aut], Davis Vaughan [aut, cre], Drew Schmidt [ctb] (read_proc_file implementation), Posit Software, PBC [cph, fnd] |
Maintainer: | Davis Vaughan <davis@posit.co> |
Repository: | CRAN |
Date/Publication: | 2025-01-16 22:40:07 UTC |
bench: High Precision Timing of R Expressions
Description
Tools to accurately benchmark and analyze execution times for R expressions.
Author(s)
Maintainer: Davis Vaughan davis@posit.co
Authors:
Jim Hester
Other contributors:
Drew Schmidt (read_proc_file implementation) [contributor]
Posit Software, PBC [copyright holder, funder]
See Also
Useful links:
Report bugs at https://github.com/r-lib/bench/issues
Examples
dat <- data.frame(x = runif(10000, 1, 1000), y=runif(10000, 1, 1000))
# `bench::mark()` implicitly calls summary() automatically
results <- bench::mark(
dat[dat$x > 500, ],
dat[which(dat$x > 500), ],
subset(dat, x > 500))
# However you can also do so explicitly to filter gc differently.
summary(results, filter_gc = FALSE)
# Or output relative times
summary(results, relative = TRUE)
Coerce to a bench mark object Bench mark objects
Description
This is typically needed only if you are performing additional manipulations
after calling mark()
.
Usage
as_bench_mark(x)
Arguments
x |
Object to be coerced |
Human readable times
Description
Construct, manipulate and display vectors of elapsed times in seconds. These are numeric vectors, so you can compare them numerically, but they can also be compared to human readable values such as '10ms'.
Usage
as_bench_time(x)
Arguments
x |
A numeric or character vector. Character representations can use shorthand sizes (see examples). |
Examples
as_bench_time("1ns")
as_bench_time("1")
as_bench_time("1us")
as_bench_time("1ms")
as_bench_time("1s")
as_bench_time("100ns") < "1ms"
sum(as_bench_time(c("1MB", "5MB", "500KB")))
Autoplot method for bench_mark objects
Description
Autoplot method for bench_mark objects
Usage
autoplot.bench_mark(
object,
type = c("beeswarm", "jitter", "ridge", "boxplot", "violin"),
...
)
## S3 method for class 'bench_mark'
plot(x, ..., type = c("beeswarm", "jitter", "ridge", "boxplot", "violin"), y)
Arguments
object |
A |
type |
The type of plot. Plotting geoms used for each type are
|
... |
Additional arguments passed to the plotting geom. |
x |
A |
y |
Ignored, required for compatibility with the |
Details
This function requires some optional dependencies. ggplot2, tidyr, and depending on the plot type ggbeeswarm, ggridges.
For type
of beeswarm
and jitter
the points are colored by the highest
level garbage collection performed during each iteration.
For plots with 2 parameters ggplot2::facet_grid()
is used to construct a
2d facet. For other numbers of parameters ggplot2::facet_wrap()
is used
instead.
Examples
dat <- data.frame(x = runif(10000, 1, 1000), y=runif(10000, 1, 1000))
res <- bench::mark(
dat[dat$x > 500, ],
dat[which(dat$x > 500), ],
subset(dat, x > 500))
if (require(ggplot2) && require(tidyr) && require(ggbeeswarm)) {
# Beeswarm plot
autoplot(res)
# ridge (joyplot)
autoplot(res, "ridge")
# If you want to have the plots ordered by execution time you can do so by
# ordering factor levels in the expressions.
if (require(dplyr) && require(forcats)) {
res %>%
mutate(expression = forcats::fct_reorder(as.character(expression), min, .desc = TRUE)) %>%
as_bench_mark() %>%
autoplot("violin")
}
}
Human readable memory sizes
Description
Construct, manipulate and display vectors of byte sizes. These are numeric vectors, so you can compare them numerically, but they can also be compared to human readable values such as '10MB'.
Usage
as_bench_bytes(x)
bench_bytes(x)
Arguments
x |
A numeric or character vector. Character representations can use shorthand sizes (see examples). |
Details
These memory sizes are always assumed to be base 1024, rather than 1000.
Examples
bench_bytes("1")
bench_bytes("1K")
bench_bytes("1Kb")
bench_bytes("1KiB")
bench_bytes("1MB")
bench_bytes("1KB") < "1MB"
sum(bench_bytes(c("1MB", "5MB", "500KB")))
Benchmark time transformation
Description
This both log transforms the times and formats the labels as a bench_time
object.
Usage
bench_bytes_trans(base = 2)
Arguments
base |
base of logarithm |
Get system load averages
Description
Uses OS system APIs to return the load average for the past 1, 5 and 15 minutes.
Usage
bench_load_average()
Measure memory that an expression used.
Description
Measure memory that an expression used.
Usage
bench_memory(expr)
Arguments
expr |
A expression to be measured. |
Value
A tibble with two columns
The total amount of memory allocated
The raw memory allocations as parsed by
profmem::readRprofmem()
Examples
if (capabilities("profmem")) {
bench_memory(1 + 1:10000)
}
Retrieve the current and maximum memory from the R process
Description
The memory reported here will likely differ from that reported by gc()
, as
this includes all memory from the R process, including any child processes
and memory allocated outside R's garbage collector heap.
Usage
bench_process_memory()
Details
The OS APIs used are as follows
Windows
macOS
linux
Measure Process CPU and real time that an expression used.
Description
Measure Process CPU and real time that an expression used.
Usage
bench_time(expr)
Arguments
expr |
A expression to be timed. |
Details
On some systems (such as macOS) the process clock has lower precision than the realtime clock, as a result there may be cases where the process time is larger than the real time for fast expressions.
Value
A bench_time
object with two values.
-
process
- The process CPU usage of the expression evaluation. -
real
- The wallclock time of the expression evaluation.
See Also
bench_memory()
To measure memory allocations for a given expression.
Examples
# This will use ~.5 seconds of real time, but very little process time.
bench_time(Sys.sleep(.5))
Benchmark time transformation
Description
This both log transforms the times and formats the labels as a bench_time
object.
Usage
bench_time_trans(base = 10)
Arguments
base |
base of logarithm |
Return the current high-resolution real time.
Description
Time is expressed as seconds since some arbitrary time in the past; it is not correlated in any way to the time of day, and thus is not subject to resetting or drifting. The hi-res timer is ideally suited to performance measurement tasks, where cheap, accurate interval timing is required.
Usage
hires_time()
Examples
hires_time()
# R rounds doubles to 7 digits by default, see greater precision by setting
# the digits argument when printing
print(hires_time(), digits = 20)
# Generally used by recording two times and then subtracting them
start <- hires_time()
end <- hires_time()
elapsed <- end - start
elapsed
Custom printing function for bench_mark
objects in knitr documents
Description
By default, data columns (result
, memory
, time
, gc
) are omitted when
printing in knitr. If you would like to include these columns, set the knitr
chunk option bench.all_columns = TRUE
.
Usage
knit_print.bench_mark(x, ..., options)
Arguments
x |
An R object to be printed |
... |
Additional arguments passed to the S3 method. Currently ignored,
except two optional arguments |
options |
A list of knitr chunk options set in the currently evaluated chunk. |
Details
You can set bench.all_columns = TRUE
to show all columns of the bench mark
object.
```{r, bench.all_columns = TRUE} bench::mark( subset(mtcars, cyl == 3), mtcars[mtcars$cyl == 3, ] ) ```
Benchmark a series of functions
Description
Benchmark a list of quoted expressions. Each expression will always run at least twice, once to measure the memory allocation and store results and one or more times to measure timing.
Usage
mark(
...,
min_time = 0.5,
iterations = NULL,
min_iterations = 1,
max_iterations = 10000,
check = TRUE,
memory = capabilities("profmem"),
filter_gc = TRUE,
relative = FALSE,
time_unit = NULL,
exprs = NULL,
env = parent.frame()
)
Arguments
... |
Expressions to benchmark, if named the |
min_time |
The minimum number of seconds to run each expression, set to
|
iterations |
If not |
min_iterations |
Each expression will be evaluated a minimum of |
max_iterations |
Each expression will be evaluated a maximum of |
check |
Check if results are consistent. If |
memory |
If |
filter_gc |
If |
relative |
If |
time_unit |
If |
exprs |
A list of quoted expressions. If supplied overrides expressions
defined in |
env |
The environment which to evaluate the expressions |
Value
A tibble with the additional summary columns. The following summary columns are computed
-
expression
-bench_expr
The deparsed expression that was evaluated (or its name if one was provided). -
min
-bench_time
The minimum execution time. -
median
-bench_time
The sample median of execution time. -
itr/sec
-double
The estimated number of executions performed per second. -
mem_alloc
-bench_bytes
Total amount of memory allocated by R while running the expression. Memory allocated outside the R heap, e.g. bymalloc()
ornew
directly is not tracked, take care to avoid misinterpreting the results if running code that may do this. -
gc/sec
-double
The number of garbage collections per second. -
n_itr
-integer
Total number of iterations after filtering garbage collections (iffilter_gc == TRUE
). -
n_gc
-double
Total number of garbage collections performed over all iterations. This is a psudo-measure of the pressure on the garbage collector, if it varies greatly between to alternatives generally the one with fewer collections will cause fewer allocation in real usage. -
total_time
-bench_time
The total time to perform the benchmarks. -
result
-list
A list column of the object(s) returned by the evaluated expression(s). -
memory
-list
A list column with results fromRprofmem()
. -
time
-list
A list column ofbench_time
vectors for each evaluated expression. -
gc
-list
A list column with tibbles containing the level of garbage collection (0-2, columns) for each iteration (rows).
See Also
press()
to run benchmarks across a grid of parameters.
Examples
dat <- data.frame(x = runif(100, 1, 1000), y=runif(10, 1, 1000))
mark(
min_time = .1,
dat[dat$x > 500, ],
dat[which(dat$x > 500), ],
subset(dat, x > 500))
Run setup code and benchmarks across a grid of parameters
Description
press()
is used to run mark()
across a grid of parameters and
then press the results together.
The parameters you want to set are given as named arguments and a grid of all possible combinations is automatically created.
The code to setup and benchmark is given by one unnamed expression (often
delimited by \{
).
If replicates are desired a dummy variable can be used, e.g. rep = 1:5
for
replicates.
Usage
press(..., .grid = NULL, .quiet = FALSE)
Arguments
... |
If named, parameters to define, if unnamed the expression to run. Only one unnamed expression is permitted. |
.grid |
A pre-built grid of values to use, typically a |
.quiet |
If |
Examples
# Helper function to create a simple data.frame of the specified dimensions
create_df <- function(rows, cols) {
as.data.frame(setNames(
replicate(cols, runif(rows, 1, 1000), simplify = FALSE),
rep_len(c("x", letters), cols)))
}
# Run 4 data sizes across 3 samples with 2 replicates (24 total benchmarks)
press(
rows = c(1000, 10000),
cols = c(10, 100),
rep = 1:2,
{
dat <- create_df(rows, cols)
bench::mark(
min_time = .05,
bracket = dat[dat$x > 500, ],
which = dat[which(dat$x > 500), ],
subset = subset(dat, x > 500)
)
}
)
Position and color scales for bench_expr data
Description
Default scales for the bench_expr
class, these are added to plots using
bench_expr
objects automatically.
Usage
scale_x_bench_expr(...)
scale_y_bench_expr(...)
scale_colour_bench_expr(
palette = scales::hue_pal(...),
...,
aesthetics = "colour"
)
scale_color_bench_expr(
palette = scales::hue_pal(...),
...,
aesthetics = "colour"
)
Position scales for bench_time data
Description
Default scales for the bench_time
class, these are added to plots using
bench_time
objects automatically.
Default scales for the bench_time
class, these are added to plots using
bench_time
objects automatically.
Usage
scale_x_bench_bytes(base = 10, ...)
scale_y_bench_bytes(base = 10, ...)
scale_x_bench_time(base = 10, ...)
scale_y_bench_time(base = 10, ...)
Arguments
base |
The base of the logarithm, if |
Summarize mark results.
Description
Summarize mark results.
Usage
## S3 method for class 'bench_mark'
summary(object, filter_gc = TRUE, relative = FALSE, time_unit = NULL, ...)
Arguments
object |
bench_mark object to summarize. |
filter_gc |
If |
relative |
If |
time_unit |
If |
... |
Additional arguments ignored. |
Details
If filter_gc == TRUE
(the default) runs that contain a garbage
collection will be removed before summarizing. This is most useful for fast
expressions when the majority of runs do not contain a gc. Call
summary(filter_gc = FALSE)
if you would like to compute summaries with
these times, such as expressions with lots of allocations when all or most
runs contain a gc.
Value
A tibble with the additional summary columns. The following summary columns are computed
-
expression
-bench_expr
The deparsed expression that was evaluated (or its name if one was provided). -
min
-bench_time
The minimum execution time. -
median
-bench_time
The sample median of execution time. -
itr/sec
-double
The estimated number of executions performed per second. -
mem_alloc
-bench_bytes
Total amount of memory allocated by R while running the expression. Memory allocated outside the R heap, e.g. bymalloc()
ornew
directly is not tracked, take care to avoid misinterpreting the results if running code that may do this. -
gc/sec
-double
The number of garbage collections per second. -
n_itr
-integer
Total number of iterations after filtering garbage collections (iffilter_gc == TRUE
). -
n_gc
-double
Total number of garbage collections performed over all iterations. This is a psudo-measure of the pressure on the garbage collector, if it varies greatly between to alternatives generally the one with fewer collections will cause fewer allocation in real usage. -
total_time
-bench_time
The total time to perform the benchmarks. -
result
-list
A list column of the object(s) returned by the evaluated expression(s). -
memory
-list
A list column with results fromRprofmem()
. -
time
-list
A list column ofbench_time
vectors for each evaluated expression. -
gc
-list
A list column with tibbles containing the level of garbage collection (0-2, columns) for each iteration (rows).
Examples
dat <- data.frame(x = runif(10000, 1, 1000), y=runif(10000, 1, 1000))
# `bench::mark()` implicitly calls summary() automatically
results <- bench::mark(
dat[dat$x > 500, ],
dat[which(dat$x > 500), ],
subset(dat, x > 500))
# However you can also do so explicitly to filter gc differently.
summary(results, filter_gc = FALSE)
# Or output relative times
summary(results, relative = TRUE)
Workout a group of expressions individually
Description
Given an block of expressions in {}
workout()
individually times each
expression in the group. workout_expressions()
is a lower level function most
useful when reading lists of calls from a file.
Usage
workout(expr, description = NULL)
workout_expressions(exprs, env = parent.frame(), description = NULL)
Arguments
expr |
one or more expressions to workout, use |
description |
A name to label each expression, if not supplied the deparsed expression will be used. |
exprs |
A list of calls to measure. |
env |
The environment in which the expressions should be evaluated. |
Examples
workout({
x <- 1:1000
evens <- x %% 2 == 0
y <- x[evens]
length(y)
length(which(evens))
sum(evens)
})
# The equivalent to the above, reading the code from a file
workout_expressions(as.list(parse(system.file("examples/exprs.R", package = "bench"))))