Type: | Package |
Title: | Genome Interval Arithmetic |
Version: | 0.8.4 |
Description: | Read and manipulate genome intervals and signals. Provides functionality similar to command-line tool suites within R, enabling interactive analysis and visualization of genome-scale data. Riemondy et al. (2017) <doi:10.12688/f1000research.11997.1>. |
License: | MIT + file LICENSE |
URL: | https://github.com/rnabioco/valr/, https://rnabioco.github.io/valr/ |
BugReports: | https://github.com/rnabioco/valr/issues |
Depends: | R (≥ 4.0.0) |
Imports: | broom, cli, cpp11bigwig, dplyr (≥ 0.8.0), ggplot2, lifecycle, Rcpp (≥ 1.0.0), readr, rlang, stringr, tibble (≥ 1.4.2) |
Suggests: | bench, covr, cowplot, curl, DBI, dbplyr, devtools, DT, GenomicRanges, IRanges, knitr, purrr, RMariaDB, rmarkdown, S4Vectors, testthat (≥ 3.0.0), vdiffr (≥ 1.0.0), tidyr |
LinkingTo: | Rcpp (≥ 1.0.0) |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Config/Needs/website: | rnabioco/rbitemplate |
Config/testthat/edition: | 3 |
Config/testthat/parallel: | true |
NeedsCompilation: | yes |
Packaged: | 2025-06-22 17:51:21 UTC; kriemo |
Author: | Jay Hesselberth |
Maintainer: | Kent Riemondy <kent.riemondy@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-06-22 18:30:02 UTC |
valr: genome interval arithmetic in R
Description
valr provides tools to read and manipulate intervals and signals on a genome reference. valr was developed to facilitate interactive analysis of genome-scale data sets, leveraging the power of dplyr and piping.
Details
To learn more about valr, start with the vignette:
browseVignettes(package = "valr")
Author(s)
Jay Hesselberth jay.hesselberth@gmail.com
Kent Riemondy kent.riemondy@gmail.com
See Also
Report bugs at https://github.com/rnabioco/valr/issues
Convert BED12 to individual exons in BED6.
Description
After conversion to BED6 format, the score
column contains the exon
number, with respect to strand (i.e., the first exon for -
strand
genes will have larger start and end coordinates).
Usage
bed12_to_exons(x)
Arguments
x |
See Also
Other utilities:
bed_makewindows()
,
bound_intervals()
,
flip_strands()
,
interval_spacing()
Examples
x <- read_bed12(valr_example("mm9.refGene.bed.gz"))
bed12_to_exons(x)
Compute absolute distances between intervals.
Description
Computes the absolute distance between the midpoint of each x
interval and
the midpoints of each closest y
interval.
Usage
bed_absdist(x, y, genome)
Arguments
x |
|
y |
|
genome |
Details
Absolute distances are scaled by the inter-reference gap for the
chromosome as follows. For Q
query points and R
reference
points on a chromosome, scale the distance for each query point i
to
the closest reference point by the inter-reference gap for each chromosome.
If an x
interval has no matching y
chromosome,
.absdist
is NA
.
d_i(x,y) = min_k(|q_i - r_k|)\frac{R}{Length\ of\ chromosome}
Both absolute and scaled distances are reported as .absdist
and
.absdist_scaled
.
Interval statistics can be used in combination with
dplyr::group_by()
and dplyr::do()
to calculate
statistics for subsets of data. See vignette('interval-stats')
for
examples.
Value
ivl_df with .absdist
and .absdist_scaled
columns.
See Also
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002529
Other interval statistics:
bed_fisher()
,
bed_jaccard()
,
bed_projection()
,
bed_reldist()
Examples
genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))
x <- bed_random(genome, seed = 1010486)
y <- bed_random(genome, seed = 9203911)
bed_absdist(x, y, genome)
Identify closest intervals.
Description
Identify closest intervals.
Usage
bed_closest(x, y, overlap = TRUE, suffix = c(".x", ".y"))
Arguments
x |
|
y |
|
overlap |
report overlapping intervals |
suffix |
colname suffixes in output |
Details
input tbls are grouped by chrom
by default, and additional
groups can be added using dplyr::group_by()
. For example,
grouping by strand
will constrain analyses to the same strand. To
compare opposing strands across two tbls, strands on the y
tbl can
first be inverted using flip_strands()
.
Value
ivl_df with additional columns:
-
.overlap
amount of overlap with overlapping interval. Non-overlapping or adjacent intervals have an overlap of 0..overlap
will not be included in the output ifoverlap = FALSE
. -
.dist
distance to closest interval. Negative distances denote upstream intervals. Book-ended intervals have a distance of 1.
Note
For each interval in x bed_closest()
returns overlapping intervals from y
and the closest non-intersecting y interval. Setting overlap = FALSE
will
report the closest non-intersecting y intervals, ignoring any overlapping y
intervals.
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/closest.html
Other multiple set operations:
bed_coverage()
,
bed_intersect()
,
bed_map()
,
bed_subtract()
,
bed_window()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 100, 125
)
y <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 25, 50,
"chr1", 140, 175
)
bed_glyph(bed_closest(x, y))
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 500, 600,
"chr2", 5000, 6000
)
y <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 100, 200,
"chr1", 150, 200,
"chr1", 550, 580,
"chr2", 7000, 8500
)
bed_closest(x, y)
bed_closest(x, y, overlap = FALSE)
# Report distance based on strand
x <- tibble::tribble(
~chrom, ~start, ~end, ~name, ~score, ~strand,
"chr1", 10, 20, "a", 1, "-"
)
y <- tibble::tribble(
~chrom, ~start, ~end, ~name, ~score, ~strand,
"chr1", 8, 9, "b", 1, "+",
"chr1", 21, 22, "b", 1, "-"
)
res <- bed_closest(x, y)
# convert distance based on strand
res$.dist_strand <- ifelse(res$strand.x == "+", res$.dist, -(res$.dist))
res
# report absolute distances
res$.abs_dist <- abs(res$.dist)
res
Cluster neighboring intervals.
Description
The output .id
column can be used in downstream grouping operations. Default
max_dist = 0
means that both overlapping and book-ended intervals will be
clustered.
Usage
bed_cluster(x, max_dist = 0)
Arguments
x |
|
max_dist |
maximum distance between clustered intervals. |
Details
input tbls are grouped by chrom
by default, and additional
groups can be added using dplyr::group_by()
. For example,
grouping by strand
will constrain analyses to the same strand. To
compare opposing strands across two tbls, strands on the y
tbl can
first be inverted using flip_strands()
.
Value
ivl_df with .id
column specifying sets of clustered intervals.
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/cluster.html
Other single set operations:
bed_complement()
,
bed_flank()
,
bed_genomecov()
,
bed_merge()
,
bed_partition()
,
bed_shift()
,
bed_slop()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 100, 200,
"chr1", 180, 250,
"chr1", 250, 500,
"chr1", 501, 1000,
"chr2", 1, 100,
"chr2", 150, 200
)
bed_cluster(x)
# glyph illustrating clustering of overlapping and book-ended intervals
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 1, 10,
"chr1", 5, 20,
"chr1", 30, 40,
"chr1", 40, 50,
"chr1", 80, 90
)
bed_glyph(bed_cluster(x), label = ".id")
Identify intervals in a genome not covered by a query.
Description
Identify intervals in a genome not covered by a query.
Usage
bed_complement(x, genome)
Arguments
x |
|
genome |
Value
See Also
Other single set operations:
bed_cluster()
,
bed_flank()
,
bed_genomecov()
,
bed_merge()
,
bed_partition()
,
bed_shift()
,
bed_slop()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 0, 10,
"chr1", 75, 100
)
genome <- tibble::tribble(
~chrom, ~size,
"chr1", 200
)
bed_glyph(bed_complement(x, genome))
genome <- tibble::tribble(
~chrom, ~size,
"chr1", 500,
"chr2", 600,
"chr3", 800
)
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 100, 300,
"chr1", 200, 400,
"chr2", 0, 100,
"chr2", 200, 400,
"chr3", 500, 600
)
# intervals not covered by x
bed_complement(x, genome)
Compute coverage of intervals.
Description
Compute coverage of intervals.
Usage
bed_coverage(x, y, ...)
Arguments
x |
|
y |
|
... |
extra arguments (not used) |
Details
input tbls are grouped by chrom
by default, and additional
groups can be added using dplyr::group_by()
. For example,
grouping by strand
will constrain analyses to the same strand. To
compare opposing strands across two tbls, strands on the y
tbl can
first be inverted using flip_strands()
.
Value
ivl_df with the following additional columns:
-
.ints
number ofx
intersections -
.cov
per-base coverage ofx
intervals -
.len
total length ofy
intervals covered byx
intervals -
.frac
.len
scaled by the number ofy
intervals
Note
Book-ended intervals are included in coverage calculations.
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/coverage.html
Other multiple set operations:
bed_closest()
,
bed_intersect()
,
bed_map()
,
bed_subtract()
,
bed_window()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end, ~strand,
"chr1", 100, 500, "+",
"chr2", 200, 400, "+",
"chr2", 300, 500, "-",
"chr2", 800, 900, "-"
)
y <- tibble::tribble(
~chrom, ~start, ~end, ~value, ~strand,
"chr1", 150, 400, 100, "+",
"chr1", 500, 550, 100, "+",
"chr2", 230, 430, 200, "-",
"chr2", 350, 430, 300, "-"
)
bed_coverage(x, y)
Fisher's test to measure overlap between two sets of intervals.
Description
Calculate Fisher's test on number of intervals that are shared and unique
between two sets of x
and y
intervals.
Usage
bed_fisher(x, y, genome)
Arguments
x |
|
y |
|
genome |
Details
Interval statistics can be used in combination with
dplyr::group_by()
and dplyr::do()
to calculate
statistics for subsets of data. See vignette('interval-stats')
for
examples.
Value
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/fisher.html
Other interval statistics:
bed_absdist()
,
bed_jaccard()
,
bed_projection()
,
bed_reldist()
Examples
genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))
x <- bed_random(genome, n = 1e4, seed = 1010486)
y <- bed_random(genome, n = 1e4, seed = 9203911)
bed_fisher(x, y, genome)
Create flanking intervals from input intervals.
Description
Create flanking intervals from input intervals.
Usage
bed_flank(
x,
genome,
both = 0,
left = 0,
right = 0,
fraction = FALSE,
strand = FALSE,
trim = FALSE,
...
)
Arguments
x |
|
genome |
|
both |
number of bases on both sizes |
left |
number of bases on left side |
right |
number of bases on right side |
fraction |
define flanks based on fraction of interval length |
strand |
define |
trim |
adjust coordinates for out-of-bounds intervals |
... |
extra arguments (not used) |
Value
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/flank.html
Other single set operations:
bed_cluster()
,
bed_complement()
,
bed_genomecov()
,
bed_merge()
,
bed_partition()
,
bed_shift()
,
bed_slop()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 25, 50,
"chr1", 100, 125
)
genome <- tibble::tribble(
~chrom, ~size,
"chr1", 130
)
bed_glyph(bed_flank(x, genome, both = 20))
x <- tibble::tribble(
~chrom, ~start, ~end, ~name, ~score, ~strand,
"chr1", 500, 1000, ".", ".", "+",
"chr1", 1000, 1500, ".", ".", "-"
)
genome <- tibble::tribble(
~chrom, ~size,
"chr1", 5000
)
bed_flank(x, genome, left = 100)
bed_flank(x, genome, right = 100)
bed_flank(x, genome, both = 100)
bed_flank(x, genome, both = 0.5, fraction = TRUE)
Calculate coverage across a genome
Description
This function is useful for calculating interval coverage across an entire genome.
Usage
bed_genomecov(x, genome, zero_depth = FALSE)
Arguments
x |
|
genome |
|
zero_depth |
If TRUE, report intervals with zero depth. Zero depth intervals will be reported with respect to groups. |
Details
input tbls are grouped by chrom
by default, and additional
groups can be added using dplyr::group_by()
. For example,
grouping by strand
will constrain analyses to the same strand. To
compare opposing strands across two tbls, strands on the y
tbl can
first be inverted using flip_strands()
.
Value
ivl_df with the an additional column:
-
.depth
depth of interval coverage
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/genomecov.html
Other single set operations:
bed_cluster()
,
bed_complement()
,
bed_flank()
,
bed_merge()
,
bed_partition()
,
bed_shift()
,
bed_slop()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end, ~strand,
"chr1", 20, 70, "+",
"chr1", 50, 100, "-",
"chr1", 200, 250, "+",
"chr1", 220, 250, "+"
)
genome <- tibble::tribble(
~chrom, ~size,
"chr1", 500,
"chr2", 1000
)
bed_genomecov(x, genome)
bed_genomecov(dplyr::group_by(x, strand), genome)
bed_genomecov(dplyr::group_by(x, strand), genome, zero_depth = TRUE)
Create example glyphs for valr functions.
Description
Used to illustrate the output of valr functions with small examples.
Usage
bed_glyph(expr, label = NULL)
Arguments
expr |
expression to evaluate |
label |
column name to use for label values. should be present in the result of the call. |
Value
Examples
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 25, 50,
"chr1", 100, 125
)
y <- tibble::tribble(
~chrom, ~start, ~end, ~value,
"chr1", 30, 75, 50
)
bed_glyph(bed_intersect(x, y))
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 30, 75,
"chr1", 50, 90,
"chr1", 91, 120
)
bed_glyph(bed_merge(x))
bed_glyph(bed_cluster(x), label = ".id")
Identify intersecting intervals.
Description
Report intersecting intervals from x
and y
tbls. Book-ended intervals
have .overlap
values of 0
in the output.
Usage
bed_intersect(x, ..., invert = FALSE, suffix = c(".x", ".y"))
Arguments
x |
|
... |
one or more (e.g. a list of) |
invert |
report |
suffix |
colname suffixes in output |
Details
input tbls are grouped by chrom
by default, and additional
groups can be added using dplyr::group_by()
. For example,
grouping by strand
will constrain analyses to the same strand. To
compare opposing strands across two tbls, strands on the y
tbl can
first be inverted using flip_strands()
.
Value
ivl_df with original columns from x
and y
suffixed with .x
and .y
, and a new .overlap
column with the extent of overlap for the
intersecting intervals.
If multiple y
tbls are supplied, the .source
contains variable names
associated with each interval. All original columns from the y
are suffixed
with .y
in the output.
If ...
contains named inputs (i.e a = y, b = z
or list(a = y, b = z)
),
then .source
will contain supplied names (see examples).
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html
Other multiple set operations:
bed_closest()
,
bed_coverage()
,
bed_map()
,
bed_subtract()
,
bed_window()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 25, 50,
"chr1", 100, 125
)
y <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 30, 75
)
bed_glyph(bed_intersect(x, y))
bed_glyph(bed_intersect(x, y, invert = TRUE))
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 100, 500,
"chr2", 200, 400,
"chr2", 300, 500,
"chr2", 800, 900
)
y <- tibble::tribble(
~chrom, ~start, ~end, ~value,
"chr1", 150, 400, 100,
"chr1", 500, 550, 100,
"chr2", 230, 430, 200,
"chr2", 350, 430, 300
)
bed_intersect(x, y)
bed_intersect(x, y, invert = TRUE)
# start and end of each overlapping interval
res <- bed_intersect(x, y)
dplyr::mutate(res,
start = pmax(start.x, start.y),
end = pmin(end.x, end.y)
)
z <- tibble::tribble(
~chrom, ~start, ~end, ~value,
"chr1", 150, 400, 100,
"chr1", 500, 550, 100,
"chr2", 230, 430, 200,
"chr2", 750, 900, 400
)
bed_intersect(x, y, z)
bed_intersect(x, exons = y, introns = z)
# a list of tbl_intervals can also be passed
bed_intersect(x, list(exons = y, introns = z))
Calculate the Jaccard statistic for two sets of intervals.
Description
Quantifies the extent of overlap between to sets of intervals in terms of base-pairs. Groups that are shared between input are used to calculate the statistic for subsets of data.
Usage
bed_jaccard(x, y)
Arguments
x |
|
y |
Details
The Jaccard statistic takes values of [0,1]
and is measured as:
J(x,y) = \frac{\mid x \bigcap y \mid}
{\mid x \bigcup y \mid} =
\frac{\mid x \bigcap y \mid}
{\mid x \mid + \mid y \mid -
\mid x \bigcap y \mid}
Interval statistics can be used in combination with
dplyr::group_by()
and dplyr::do()
to calculate
statistics for subsets of data. See vignette('interval-stats')
for
examples.
Value
tibble with the following columns:
-
len_i
length of the intersection in base-pairs -
len_u
length of the union in base-pairs -
jaccard
value of jaccard statistic -
n_int
number of intersecting intervals betweenx
andy
If inputs are grouped, the return value will contain one set of values per group.
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/jaccard.html
Other interval statistics:
bed_absdist()
,
bed_fisher()
,
bed_projection()
,
bed_reldist()
Examples
genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))
x <- bed_random(genome, seed = 1010486)
y <- bed_random(genome, seed = 9203911)
bed_jaccard(x, y)
# calculate jaccard per chromosome
bed_jaccard(
dplyr::group_by(x, chrom),
dplyr::group_by(y, chrom)
)
Divide intervals into new sub-intervals ("windows").
Description
Divide intervals into new sub-intervals ("windows").
Usage
bed_makewindows(x, win_size = 0, step_size = 0, num_win = 0, reverse = FALSE)
Arguments
x |
|
win_size |
divide intervals into fixed-size windows |
step_size |
size to step before next window |
num_win |
divide intervals to fixed number of windows |
reverse |
reverse window numbers |
Value
ivl_df with .win_id
column that contains a numeric
identifier for the window.
Note
The name
and .win_id
columns can be used to create new
interval names (see 'namenum' example below) or in subsequent
group_by
operations (see vignette).
See Also
Other utilities:
bed12_to_exons()
,
bound_intervals()
,
flip_strands()
,
interval_spacing()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end, ~name, ~score, ~strand,
"chr1", 100, 200, "A", ".", "+"
)
bed_glyph(bed_makewindows(x, num_win = 10), label = ".win_id")
# Fixed number of windows
bed_makewindows(x, num_win = 10)
# Fixed window size
bed_makewindows(x, win_size = 10)
# Fixed window size with overlaps
bed_makewindows(x, win_size = 10, step_size = 5)
# reverse win_id
bed_makewindows(x, win_size = 10, reverse = TRUE)
# bedtools 'namenum'
wins <- bed_makewindows(x, win_size = 10)
dplyr::mutate(wins, namenum = stringr::str_c(name, "_", .win_id))
Calculate summaries from overlapping intervals.
Description
Apply functions like min()
and max()
to intersecting intervals.
bed_map()
uses bed_intersect()
to identify intersecting intervals, so
output columns will be suffixed with .x
and .y
. Expressions that refer to
input columns from x
and y
columns must take these suffixes into account.
Usage
bed_map(x, y, ..., min_overlap = 1)
concat(.data, sep = ",")
values_unique(.data, sep = ",")
values(.data, sep = ",")
Arguments
x |
|
y |
|
... |
name-value pairs specifying column names and expressions to apply |
min_overlap |
minimum overlap for intervals. |
.data |
data |
sep |
separator character |
Details
Book-ended intervals can be included by setting min_overlap = 0
.
Non-intersecting intervals from x
are included in the result with NA
values.
input tbls are grouped by chrom
by default, and additional
groups can be added using dplyr::group_by()
. For example,
grouping by strand
will constrain analyses to the same strand. To
compare opposing strands across two tbls, strands on the y
tbl can
first be inverted using flip_strands()
.
Value
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/map.html
Other multiple set operations:
bed_closest()
,
bed_coverage()
,
bed_intersect()
,
bed_subtract()
,
bed_window()
Examples
x <- tibble::tribble(
~chrom,
~start,
~end,
'chr1',
100,
250,
'chr2',
250,
500
)
y <- tibble::tribble(
~chrom,
~start,
~end,
~value,
'chr1',
100,
250,
10,
'chr1',
150,
250,
20,
'chr2',
250,
500,
500
)
bed_glyph(bed_map(x, y, value = sum(value)), label = 'value')
# summary examples
bed_map(x, y, .sum = sum(value))
bed_map(x, y, .min = min(value), .max = max(value))
# identify non-intersecting intervals to include in the result
res <- bed_map(x, y, .sum = sum(value))
x_not <- bed_intersect(x, y, invert = TRUE)
dplyr::bind_rows(res, x_not)
# create a list-column
bed_map(x, y, .values = list(value))
# use `nth` family from dplyr
bed_map(x, y, .first = dplyr::first(value))
bed_map(x, y, .absmax = abs(max(value)))
bed_map(x, y, .count = length(value))
bed_map(x, y, .vals = values(value))
# count defaults are NA not 0; differs from bedtools2 ...
bed_map(x, y, .counts = dplyr::n())
# ... but NA counts can be coverted to 0's
dplyr::mutate(
bed_map(x, y, .counts = dplyr::n()),
.counts = ifelse(is.na(.counts), 0, .counts)
)
Merge overlapping intervals.
Description
Operations can be performed on merged intervals by specifying name-value
pairs. Default max_dist
of 0
means book-ended intervals are
merged.
Usage
bed_merge(x, max_dist = 0, ...)
Arguments
x |
|
max_dist |
maximum distance between intervals to merge |
... |
name-value pairs that specify operations on merged intervals |
Details
input tbls are grouped by chrom
by default, and additional
groups can be added using dplyr::group_by()
. For example,
grouping by strand
will constrain analyses to the same strand. To
compare opposing strands across two tbls, strands on the y
tbl can
first be inverted using flip_strands()
.
Value
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/merge.html
Other single set operations:
bed_cluster()
,
bed_complement()
,
bed_flank()
,
bed_genomecov()
,
bed_partition()
,
bed_shift()
,
bed_slop()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 1, 50,
"chr1", 10, 75,
"chr1", 100, 120
)
bed_glyph(bed_merge(x))
x <- tibble::tribble(
~chrom, ~start, ~end, ~value, ~strand,
"chr1", 1, 50, 1, "+",
"chr1", 100, 200, 2, "+",
"chr1", 150, 250, 3, "-",
"chr2", 1, 25, 4, "+",
"chr2", 200, 400, 5, "-",
"chr2", 400, 500, 6, "+",
"chr2", 450, 550, 7, "+"
)
bed_merge(x)
bed_merge(x, max_dist = 100)
# merge intervals on same strand
bed_merge(dplyr::group_by(x, strand))
bed_merge(x, .value = sum(value))
Partition intervals into elemental intervals
Description
Convert a set of intervals into elemental intervals that contain each start and end position in the set.
Usage
bed_partition(x, ...)
Arguments
x |
|
... |
name-value pairs specifying column names and expressions to apply |
Details
Summary operations, such as min()
or max()
can be performed
on elemental intervals by specifying name-value pairs.
This function is useful for calculating summaries across overlapping intervals without merging the intervals.
input tbls are grouped by chrom
by default, and additional
groups can be added using dplyr::group_by()
. For example,
grouping by strand
will constrain analyses to the same strand. To
compare opposing strands across two tbls, strands on the y
tbl can
first be inverted using flip_strands()
.
Value
See Also
Other single set operations:
bed_cluster()
,
bed_complement()
,
bed_flank()
,
bed_genomecov()
,
bed_merge()
,
bed_shift()
,
bed_slop()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end, ~value, ~strand,
"chr1", 100, 500, 10, "+",
"chr1", 200, 400, 20, "-",
"chr1", 300, 550, 30, "+",
"chr1", 550, 575, 2, "+",
"chr1", 800, 900, 5, "+"
)
bed_glyph(bed_partition(x))
bed_glyph(bed_partition(x, value = sum(value)), label = "value")
bed_partition(x)
# compute summary over each elemental interval
bed_partition(x, value = sum(value))
# partition and compute summaries based on group
x <- dplyr::group_by(x, strand)
bed_partition(x, value = sum(value))
# combine values across multiple tibbles
y <- tibble::tribble(
~chrom, ~start, ~end, ~value, ~strand,
"chr1", 10, 500, 100, "+",
"chr1", 250, 420, 200, "-",
"chr1", 350, 550, 300, "+",
"chr1", 550, 555, 20, "+",
"chr1", 800, 900, 50, "+"
)
x <- dplyr::bind_rows(x, y)
bed_partition(x, value = sum(value))
Projection test for query interval overlap.
Description
Projection test for query interval overlap.
Usage
bed_projection(x, y, genome, by_chrom = FALSE)
Arguments
x |
|
y |
|
genome |
|
by_chrom |
compute test per chromosome |
Details
Interval statistics can be used in combination with
dplyr::group_by()
and dplyr::do()
to calculate
statistics for subsets of data. See vignette('interval-stats')
for
examples.
Value
ivl_df with the following columns:
-
chrom
the name of chromosome tested ifby_chrom = TRUE
, otherwise has a value ofwhole_genome
-
p.value
p-value from a binomial test. p-values > 0.5 are converted to1 - p-value
andlower_tail
isFALSE
-
obs_exp_ratio
ratio of observed to expected overlap frequency -
lower_tail
TRUE
indicates the observed overlaps are in the lower tail of the distribution (e.g., less overlap than expected).FALSE
indicates that the observed overlaps are in the upper tail of the distribution (e.g., more overlap than expected)
See Also
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002529
Other interval statistics:
bed_absdist()
,
bed_fisher()
,
bed_jaccard()
,
bed_reldist()
Examples
genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))
x <- bed_random(genome, seed = 1010486)
y <- bed_random(genome, seed = 9203911)
bed_projection(x, y, genome)
bed_projection(x, y, genome, by_chrom = TRUE)
Generate randomly placed intervals on a genome.
Description
Generate randomly placed intervals on a genome.
Usage
bed_random(genome, length = 1000, n = 1e+06, seed = 0, sorted = TRUE)
Arguments
genome |
|
length |
length of intervals |
n |
number of intervals to generate |
seed |
seed RNG for reproducible intervals |
sorted |
return sorted output |
Details
Sorting can be suppressed with sorted = FALSE
.
Value
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/random.html
Other randomizing operations:
bed_shuffle()
Examples
genome <- tibble::tribble(
~chrom, ~size,
"chr1", 10000000,
"chr2", 50000000,
"chr3", 60000000,
"chrX", 5000000
)
bed_random(genome, seed = 10104)
# sorting can be suppressed
bed_random(genome, sorted = FALSE, seed = 10104)
# 500 random intervals of length 500
bed_random(genome, length = 500, n = 500, seed = 10104)
Compute relative distances between intervals.
Description
Compute relative distances between intervals.
Usage
bed_reldist(x, y, detail = FALSE)
Arguments
x |
|
y |
|
detail |
report relative distances for each |
Details
Interval statistics can be used in combination with
dplyr::group_by()
and dplyr::do()
to calculate
statistics for subsets of data. See vignette('interval-stats')
for
examples.
Value
If detail = FALSE
, a ivl_df that summarizes
calculated .reldist
values with the following columns:
-
.reldist
relative distance metric -
.counts
number of metric observations -
.total
total observations -
.freq
frequency of observation
If detail = TRUE
, the .reldist
column reports the relative
distance for each input x
interval.
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/reldist.html
Other interval statistics:
bed_absdist()
,
bed_fisher()
,
bed_jaccard()
,
bed_projection()
Examples
genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))
x <- bed_random(genome, seed = 1010486)
y <- bed_random(genome, seed = 9203911)
bed_reldist(x, y)
bed_reldist(x, y, detail = TRUE)
Adjust intervals by a fixed size.
Description
Out-of-bounds intervals are removed by default.
Usage
bed_shift(x, genome, size = 0, fraction = 0, trim = FALSE)
Arguments
x |
|
genome |
|
size |
number of bases to shift. positive numbers shift right, negative shift left. |
fraction |
define |
trim |
adjust coordinates for out-of-bounds intervals |
Value
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/shift.html
Other single set operations:
bed_cluster()
,
bed_complement()
,
bed_flank()
,
bed_genomecov()
,
bed_merge()
,
bed_partition()
,
bed_slop()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 25, 50,
"chr1", 100, 125
)
genome <- tibble::tribble(
~chrom, ~size,
"chr1", 125
)
bed_glyph(bed_shift(x, genome, size = -20))
x <- tibble::tribble(
~chrom, ~start, ~end, ~strand,
"chr1", 100, 150, "+",
"chr1", 200, 250, "+",
"chr2", 300, 350, "+",
"chr2", 400, 450, "-",
"chr3", 500, 550, "-",
"chr3", 600, 650, "-"
)
genome <- tibble::tribble(
~chrom, ~size,
"chr1", 1000,
"chr2", 2000,
"chr3", 3000
)
bed_shift(x, genome, 100)
bed_shift(x, genome, fraction = 0.5)
# shift with respect to strand
stranded <- dplyr::group_by(x, strand)
bed_shift(stranded, genome, 100)
Shuffle input intervals.
Description
Shuffle input intervals.
Usage
bed_shuffle(
x,
genome,
incl = NULL,
excl = NULL,
max_tries = 1000,
within = FALSE,
seed = 0
)
Arguments
x |
|
genome |
|
incl |
ivl_df of included intervals |
excl |
ivl_df of excluded intervals |
max_tries |
maximum tries to identify a bounded interval |
within |
shuffle within chromosomes |
seed |
seed for reproducible intervals |
Value
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/shuffle.html
Other randomizing operations:
bed_random()
Examples
genome <- tibble::tribble(
~chrom, ~size,
"chr1", 1e6,
"chr2", 2e6,
"chr3", 4e6
)
x <- bed_random(genome, seed = 1010486)
bed_shuffle(x, genome, seed = 9830491)
Increase the size of input intervals.
Description
Increase the size of input intervals.
Usage
bed_slop(
x,
genome,
both = 0,
left = 0,
right = 0,
fraction = FALSE,
strand = FALSE,
trim = FALSE,
...
)
Arguments
x |
|
genome |
|
both |
number of bases on both sizes |
left |
number of bases on left side |
right |
number of bases on right side |
fraction |
define flanks based on fraction of interval length |
strand |
define |
trim |
adjust coordinates for out-of-bounds intervals |
... |
extra arguments (not used) |
Value
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/slop.html
Other single set operations:
bed_cluster()
,
bed_complement()
,
bed_flank()
,
bed_genomecov()
,
bed_merge()
,
bed_partition()
,
bed_shift()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 110, 120,
"chr1", 225, 235
)
genome <- tibble::tribble(
~chrom, ~size,
"chr1", 400
)
bed_glyph(bed_slop(x, genome, both = 20, trim = TRUE))
genome <- tibble::tribble(
~chrom, ~size,
"chr1", 5000
)
x <- tibble::tribble(
~chrom, ~start, ~end, ~name, ~score, ~strand,
"chr1", 500, 1000, ".", ".", "+",
"chr1", 1000, 1500, ".", ".", "-"
)
bed_slop(x, genome, left = 100)
bed_slop(x, genome, right = 100)
bed_slop(x, genome, both = 100)
bed_slop(x, genome, both = 0.5, fraction = TRUE)
Sort a set of intervals.
Description
Sort a set of intervals.
Usage
bed_sort(x, by_size = FALSE, by_chrom = FALSE, reverse = FALSE)
Arguments
x |
|
by_size |
sort by interval size |
by_chrom |
sort within chromosome |
reverse |
reverse sort order |
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/sort.html
Examples
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr8", 500, 1000,
"chr8", 1000, 5000,
"chr8", 100, 200,
"chr1", 100, 300,
"chr1", 100, 200
)
# sort by chrom and start
bed_sort(x)
# reverse sort order
bed_sort(x, reverse = TRUE)
# sort by interval size
bed_sort(x, by_size = TRUE)
# sort by decreasing interval size
bed_sort(x, by_size = TRUE, reverse = TRUE)
# sort by interval size within chrom
bed_sort(x, by_size = TRUE, by_chrom = TRUE)
Subtract two sets of intervals.
Description
Subtract y
intervals from x
intervals.
Usage
bed_subtract(x, y, any = FALSE)
Arguments
x |
|
y |
|
any |
remove any |
Details
input tbls are grouped by chrom
by default, and additional
groups can be added using dplyr::group_by()
. For example,
grouping by strand
will constrain analyses to the same strand. To
compare opposing strands across two tbls, strands on the y
tbl can
first be inverted using flip_strands()
.
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/subtract.html
Other multiple set operations:
bed_closest()
,
bed_coverage()
,
bed_intersect()
,
bed_map()
,
bed_window()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 1, 100
)
y <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 50, 75
)
bed_glyph(bed_subtract(x, y))
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 100, 200,
"chr1", 250, 400,
"chr1", 500, 600,
"chr1", 1000, 1200,
"chr1", 1300, 1500
)
y <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 150, 175,
"chr1", 510, 525,
"chr1", 550, 575,
"chr1", 900, 1050,
"chr1", 1150, 1250,
"chr1", 1299, 1501
)
bed_subtract(x, y)
bed_subtract(x, y, any = TRUE)
Identify intervals within a specified distance.
Description
Identify intervals within a specified distance.
Usage
bed_window(x, y, genome, ...)
Arguments
x |
|
y |
|
genome |
|
... |
params for bed_slop and bed_intersect |
Details
input tbls are grouped by chrom
by default, and additional
groups can be added using dplyr::group_by()
. For example,
grouping by strand
will constrain analyses to the same strand. To
compare opposing strands across two tbls, strands on the y
tbl can
first be inverted using flip_strands()
.
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/window.html
Other multiple set operations:
bed_closest()
,
bed_coverage()
,
bed_intersect()
,
bed_map()
,
bed_subtract()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 25, 50,
"chr1", 100, 125
)
y <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 60, 75
)
genome <- tibble::tribble(
~chrom, ~size,
"chr1", 125
)
bed_glyph(bed_window(x, y, genome, both = 15))
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 10, 100,
"chr2", 200, 400,
"chr2", 300, 500,
"chr2", 800, 900
)
y <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 150, 400,
"chr2", 230, 430,
"chr2", 350, 430
)
genome <- tibble::tribble(
~chrom, ~size,
"chr1", 500,
"chr2", 1000
)
bed_window(x, y, genome, both = 100)
Select intervals bounded by a genome.
Description
Used to remove out-of-bounds intervals, or trim interval coordinates using a
genome
.
Usage
bound_intervals(x, genome, trim = FALSE)
Arguments
x |
|
genome |
|
trim |
adjust coordinates for out-of-bounds intervals |
Value
See Also
Other utilities:
bed12_to_exons()
,
bed_makewindows()
,
flip_strands()
,
interval_spacing()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", -100, 500,
"chr1", 100, 1e9,
"chr1", 500, 1000
)
genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))
# out-of-bounds are removed by default ...
bound_intervals(x, genome)
# ... or can be trimmed within the bounds of a genome
bound_intervals(x, genome, trim = TRUE)
Create intron features.
Description
Numbers in the score
column are intron numbers from 5' to 3' independent of
strand. I.e., the first introns for +
and -
strand genes both have score
values of 1
.
Usage
create_introns(x)
Arguments
x |
ivl_df in BED12 format |
See Also
Other feature functions:
create_tss()
,
create_utrs3()
,
create_utrs5()
Examples
x <- read_bed12(valr_example("mm9.refGene.bed.gz"))
create_introns(x)
Create transcription start site features.
Description
Create transcription start site features.
Usage
create_tss(x)
Arguments
x |
ivl_df in BED format |
See Also
Other feature functions:
create_introns()
,
create_utrs3()
,
create_utrs5()
Examples
x <- read_bed12(valr_example("mm9.refGene.bed.gz"))
create_tss(x)
Create 3' UTR features.
Description
Create 3' UTR features.
Usage
create_utrs3(x)
Arguments
x |
ivl_df in BED12 format |
See Also
Other feature functions:
create_introns()
,
create_tss()
,
create_utrs5()
Examples
x <- read_bed12(valr_example("mm9.refGene.bed.gz"))
create_utrs3(x)
Create 5' UTR features.
Description
Create 5' UTR features.
Usage
create_utrs5(x)
Arguments
x |
ivl_df in BED12 format |
See Also
Other feature functions:
create_introns()
,
create_tss()
,
create_utrs3()
Examples
x <- read_bed12(valr_example("mm9.refGene.bed.gz"))
create_utrs5(x)
Fetch data from remote databases.
Description
Currently db_ucsc
and db_ensembl
are available for connections.
Usage
db_ucsc(
dbname,
host = "genome-mysql.cse.ucsc.edu",
user = "genomep",
password = "password",
port = 3306,
...
)
db_ensembl(
dbname,
host = "ensembldb.ensembl.org",
user = "anonymous",
password = "",
port = 3306,
...
)
Arguments
dbname |
name of database |
host |
hostname |
user |
username |
password |
password |
port |
MySQL connection port |
... |
params for connection |
See Also
https://genome.ucsc.edu/goldenpath/help/mysql.html
https://www.ensembl.org/info/data/mysql.html
Examples
## Not run:
if (require(RMariaDB)) {
library(dplyr)
ucsc <- db_ucsc("hg38")
# fetch the `refGene` tbl
tbl(ucsc, "refGene")
# the `chromInfo` tbls have size information
tbl(ucsc, "chromInfo")
}
## End(Not run)
## Not run:
if (require(RMariaDB)) {
library(dplyr)
# squirrel genome
ensembl <- db_ensembl("spermophilus_tridecemlineatus_core_67_2")
tbl(ensembl, "gene")
}
## End(Not run)
Flip strands in intervals.
Description
Flips positive (+
) stranded intervals to negative (-
) strands,
and vice-versa. Facilitates comparisons among intervals on opposing strands.
Usage
flip_strands(x)
Arguments
x |
See Also
Other utilities:
bed12_to_exons()
,
bed_makewindows()
,
bound_intervals()
,
interval_spacing()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end, ~strand,
"chr1", 1, 100, "+",
"chr2", 1, 100, "-"
)
flip_strands(x)
Convert Granges to bed tibble
Description
Convert Granges to bed tibble
Usage
gr_to_bed(x)
Arguments
x |
GRanges object to convert to bed tibble. |
Value
Examples
## Not run:
gr <- GenomicRanges::GRanges(
seqnames = S4Vectors::Rle(
c("chr1", "chr2", "chr1", "chr3"),
c(1, 1, 1, 1)
),
ranges = IRanges::IRanges(
start = c(1, 10, 50, 100),
end = c(100, 500, 1000, 2000),
names = head(letters, 4)
),
strand = S4Vectors::Rle(
c("-", "+"), c(2, 2)
)
)
gr_to_bed(gr)
# There are two ways to convert a bed-like data.frame to GRanges:
gr <- GenomicRanges::GRanges(
seqnames = S4Vectors::Rle(x$chrom),
ranges = IRanges::IRanges(
start = x$start + 1,
end = x$end,
names = x$name
),
strand = S4Vectors::Rle(x$strand)
)
# or:
gr <- GenomicRanges::makeGRangesFromDataFrame(dplyr::mutate(x, start = start + 1))
## End(Not run)
Plyr function id packaged due to plyr being retired Compute a unique numeric id for each unique row in a data frame.
Description
Properties:
-
order(id)
is equivalent todo.call(order, df)
rows containing the same data have the same value
if
drop = FALSE
then room for all possibilites
Usage
id(.variables, drop = FALSE)
Arguments
.variables |
list of variables |
drop |
drop unusued factor levels? |
Value
a numeric vector with attribute n, giving total number of possibilities
See Also
Plyr function id_var packaged due to plyr being retired Numeric id for a vector.
Description
Plyr function id_var packaged due to plyr being retired Numeric id for a vector.
Usage
id_var(x, drop = FALSE)
Calculate interval spacing.
Description
Spacing for the first interval of each chromosome is undefined (NA
). The
leading interval of an overlapping interval pair has a negative value.
Usage
interval_spacing(x)
Arguments
x |
Value
ivl_df with .spacing
column.
See Also
Other utilities:
bed12_to_exons()
,
bed_makewindows()
,
bound_intervals()
,
flip_strands()
Examples
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 1, 100,
"chr1", 150, 200,
"chr2", 200, 300
)
interval_spacing(x)
Bed-like data.frame requirements for valr functions
Description
Required column names for interval dataframes are
chrom
, start
and end
. Internally interval dataframes are
validated using check_interval()
Required column names for genome dataframes are
chrom
and size
. Internally genome dataframes are
validated using check_genome()
.
Usage
check_interval(x)
check_genome(x)
Arguments
x |
A |
Examples
# using tibble
x <- tibble::tribble(
~chrom, ~start, ~end,
"chr1", 1, 50,
"chr1", 10, 75,
"chr1", 100, 120
)
check_interval(x)
# using base R data.frame
x <- data.frame(
chrom = "chr1",
start = 0,
end = 100,
stringsAsFactors = FALSE
)
check_interval(x)
# example genome input
x <- tibble::tribble(
~chrom, ~size,
"chr1", 1e6
)
check_genome(x)
Read BED and related files.
Description
read functions for BED and related formats. Filenames can be
local file or URLs. The read functions load data into tbls with consistent
chrom
, start
and end
colnames.
Usage
read_bed(
filename,
col_types = bed12_coltypes,
sort = TRUE,
...,
n_fields = NULL
)
read_bed12(filename, ...)
read_bedgraph(filename, ...)
read_narrowpeak(filename, ...)
read_broadpeak(filename, ...)
Arguments
filename |
file or URL |
col_types |
column type spec for |
sort |
sort the tbl by chrom and start |
... |
options to pass to |
n_fields |
Details
https://genome.ucsc.edu/FAQ/FAQformat.html#format1
https://genome.ucsc.edu/FAQ/FAQformat.html#format1
https://genome.ucsc.edu/goldenPath/help/bedgraph.html
https://genome.ucsc.edu/FAQ/FAQformat.html#format12
https://genome.ucsc.edu/FAQ/FAQformat.html#format13
Value
See Also
Other read functions:
read_genome()
,
read_vcf()
Examples
# read_bed assumes 3 field BED format.
read_bed(valr_example("3fields.bed.gz"))
# result is sorted by chrom and start unless `sort = FALSE`
read_bed(valr_example("3fields.bed.gz"), sort = FALSE)
read_bed12(valr_example("mm9.refGene.bed.gz"))
read_bedgraph(valr_example("test.bg.gz"))
read_narrowpeak(valr_example("sample.narrowPeak.gz"))
read_broadpeak(valr_example("sample.broadPeak.gz"))
Read a bigwig file into a valr compatible tbl
Description
This function will output a 4 column tibble with zero-based chrom, start, end, value columns.
Usage
read_bigwig(path, ...)
Arguments
path |
path to bigWig file |
... |
params for |
Examples
read_bigwig(valr_example("hg19.dnase1.bw"))
read_bigwig(valr_example("hg19.dnase1.bw"), as = "GRanges")
Read genome files.
Description
Genome files (UCSC "chromSize" files) contain chromosome name and size information. These sizes are used by downstream functions to identify computed intervals that have coordinates outside of the genome bounds.
Usage
read_genome(path)
Arguments
path |
containing chrom/contig names and sizes, one-pair-per-line, tab-delimited |
Value
genome_df, sorted by size
Note
URLs to genome files can also be used.
See Also
Other read functions:
read_bed()
,
read_vcf()
Examples
read_genome(valr_example("hg19.chrom.sizes.gz"))
## Not run:
# `read_genome` accepts a URL
read_genome("https://genome.ucsc.edu/goldenpath/help/hg19.chrom.sizes")
## End(Not run)
Import and convert a GTF/GFF file into a valr compatible bed tbl format
Description
This function will output a tibble with the required chrom, start, and end columns, as well as other columns depending on content in GTF/GFF file.
Usage
read_gtf(path, zero_based = TRUE)
Arguments
path |
path to gtf or gff file |
zero_based |
if TRUE, convert to zero based |
Examples
## Not run:
gtf <- read_gtf(valr_example("hg19.gencode.gtf.gz"))
head(gtf)
## End(Not run)
Read a VCF file.
Description
Read a VCF file.
Usage
read_vcf(vcf)
Arguments
vcf |
vcf filename |
Value
data_frame
Note
return value has chrom
, start
and end
columns.
Interval lengths are the size of the 'REF' field.
See Also
Other read functions:
read_bed()
,
read_genome()
Examples
vcf_file <- valr_example("test.vcf.gz")
read_vcf(vcf_file)
Provide working directory for valr example files.
Description
Provide working directory for valr example files.
Usage
valr_example(path)
Arguments
path |
path to file |
Examples
valr_example("hg19.chrom.sizes.gz")