Type: | Package |
Title: | Distance Matrix Utilities |
Version: | 0.4.0 |
Description: | Functions to re-arrange, extract, and work with distances. |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 6.1.1 |
Suggests: | testthat, dplyr, tibble, tidyr |
NeedsCompilation: | no |
Packaged: | 2020-03-01 12:00:06 UTC; bittingerk |
Author: | Kyle Bittinger [aut, cre] |
Maintainer: | Kyle Bittinger <kylebittinger@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2020-03-01 21:30:03 UTC |
usedist: a package for working with distance matrices in R
Description
In usedist, we provide a number of functions to help with distance matrix
objects, such as those produced by the dist
function. Some functions
are geared towards making or altering distance matrix objects. Others
relate to groups of items in the distance matrix. They provide access to
within- or between-group distances, or use these distances to infer the
distance to group centroids.
Compute the distance between group centroids
Description
Compute the distance between group centroids
Usage
dist_between_centroids(d, idx1, idx2, squared = FALSE)
Arguments
d |
A distance matrix object of class |
idx1 |
A vector of items in group 1. |
idx2 |
A vector of items in group 2. |
squared |
If |
Details
If you have a distance matrix, and the objects are partitioned into groups, you might like to know the distance between the group centroids. The centroid of each group is simply the center of mass for the group.
It is possible to infer the distance between group centroids directly from
the distances between items in each group. The adonis
test in the
ecology package vegan
takes advantage of this approach to carry out
an ANOVA-like test on distances.
The approach rests on the assumption that the objects occupy some high-dimensional Euclidean space. However, we do not have to actually create the space to find the distance between centroids. Based on the assumption that such a space exists, we can use an algebraic formula to perform the computation.
The formulas for this were presented by Apostol and Mnatsakanian in 2003, though we need to re-arrange equation 28 in their paper to get the value we want:
| c_1 - c_2 | = \sqrt{
\frac{1}{n_1 n_2} \sum_{(1,2)} -
\frac{1}{n_1^2} \sum_{(1)} -
\frac{1}{n_2^2} \sum_{(2)}},
where n_1
is the number of samples in group 1, \sum_{(1)}
is the
sum of squared distances between items in group 1, and \sum_{(1,2)}
is
the sum of squared distances between items in group 1 and those in group 2.
Sometimes, the distance between centroids is not a real number, because it
is not possible to create a space where this distance exists. Mathematically,
we get a negative number underneath the square root in the equation above.
If this happens, the function returns NaN
. If you'd like to have
access to this value, you can set squared = TRUE
to return the
squared distance between centroids. In this case, you will never get
NaN
, but you might receive negative numbers in your result.
Value
The distance between group centroids (see details).
References
Apostol, T.M. and Mnatsakanian, M.A. Sums of squares of distances in m-space. Math. Assoc. Am. Monthly 110, 516 (2003).
Retrieve distances from a dist
object.
Description
Retrieve distances from a dist
object.
Usage
dist_get(d, idx1, idx2)
Arguments
d |
A distance matrix object of class |
idx1 , idx2 |
Indices specifying the distances to extract. |
Value
A vector of distances.
Examples
m4 <- matrix(1:16, nrow=4, dimnames=list(LETTERS[1:4]))
dm4 <- dist(m4)
dist_get(dm4, "A", "C")
dist_get(dm4, "A", c("A", "B", "C", "D"))
dist_get(dm4, c("A", "B", "C"), c("B", "D", "B"))
Create a data frame of distances between groups of items.
Description
Create a data frame of distances between groups of items.
Usage
dist_groups(d, g)
Arguments
d |
A distance matrix object of class |
g |
A factor representing the groups of objects in |
Value
A data frame with 6 columns:
- Item1, Item2
The items being compared.
- Group1, Group2
The groups to which the items belong.
- Label
A convenient label for plotting or comparison.
- Distance
The distance between Item1 and Item2.
Examples
m4 <- matrix(1:16, nrow=4, dimnames=list(LETTERS[1:4]))
dm4 <- dist(m4)
g4 <- rep(c("Control", "Treatment"), each=2)
dist_groups(dm4, g4)
Make a distance matrix using a custom distance function
Description
Make a distance matrix using a custom distance function
Usage
dist_make(x, distance_fcn, ...)
Arguments
x |
A matrix of observations, one per row |
distance_fcn |
A function used to compute the distance between two
rows of the data matrix. The two rows will be passed as the first and
second arguments to |
... |
Additional arguments passed to |
Details
We do not set the call
or method
attributes of the
dist
object.
Value
A dist
object containing the distances between rows of the
data matrix.
Examples
x <- matrix(sin(1:30), nrow=5)
rownames(x) <- LETTERS[1:5]
manhattan_distance <- function (v1, v2) sum(abs(v1 - v2))
dist_make(x, manhattan_distance)
Make a new distance matrix of centroid distances between multiple groups
Description
Make a new distance matrix of centroid distances between multiple groups
Usage
dist_multi_centroids(d, g, squared = FALSE)
Arguments
d |
A distance matrix object of class |
g |
A factor representing the groups of items in |
squared |
If |
Value
A distance matrix of distances between the group centroids.
Set the names/labels of a dist
object.
Description
Set the names/labels of a dist
object.
Usage
dist_setNames(d, nm)
Arguments
d |
A distance matrix object of class |
nm |
New labels for the rows/columns. |
Value
A distance matrix with new row/column labels.
Examples
m4 <- matrix(1:16, nrow=4, dimnames=list(LETTERS[1:4]))
dm4 <- dist(m4)
dist_setNames(dm4, LETTERS[9:12])
Extract parts of a dist
object.
Description
Extract a subset of values from a distance matrix. This function also works to re-arrange the rows of a distance matrix, if they are provided in the desired order.
Usage
dist_subset(d, idx)
Arguments
d |
A distance matrix object of class |
idx |
Indices specifying the subset of distances to extract. |
Value
A distance matrix.
Examples
m4 <- matrix(1:16, nrow=4, dimnames=list(LETTERS[1:4]))
dm4 <- dist(m4)
dist_subset(dm4, c("A", "B", "C"))
dist_subset(dm4, c("D", "C", "B", "A"))
Compute distances from each item to group centroids
Description
Compute distances from each item to group centroids
Usage
dist_to_centroids(d, g, squared = FALSE)
Arguments
d |
A distance matrix object of class |
g |
A factor representing the groups of items in |
squared |
If |
Details
This function computes the distance from each item to the centroid positions
of groups defined in the argument g
. This is accomplished without
determining the centroid positions directly; see the documentation for
dist_between_centroids
for details on this procedure.
If the distance can't be represented in a Euclidean space, the
CentroidDistance
is set to NaN
. See the documentation for
dist_between_centroids
for further details.
Value
A data frame with distances to the group centroids:
- Item
-
A character vector of item labels from the dist object, or an integer vector of item locations if labels are not present.
- CentroidGroup
-
The group for which the centroid distance is given. The column type should match that of the argument g (the
unique
function is used to generate this column). - CentroidDistance
-
Inferred distance from the item to the centroid position of the indicated group.
Convert a data frame in long format to a numeric matrix
Description
Convert a data frame in long format to a numeric matrix
Usage
pivot_to_numeric_matrix(data, obs_col, feature_col, value_col)
Arguments
data |
A data frame with numerical values in long format. |
obs_col |
The column listing the observation, or row of the matrix. |
feature_col |
The column listing the feature, or column of the matrix. |
value_col |
The column listing the value, to be placed inside the matrix. The parameters This function requires the packages |
Examples
longdata <- data.frame(
SampleID = paste0("Sample", c(1, 1, 1, 2, 2, 3, 3)),
FeatureID = paste0("Feature", c(1, 2, 3, 1, 2, 2, 3)),
Value = c(132, 41, 7, 56, 11, 929, 83))
longdata
pivot_to_numeric_matrix(longdata, SampleID, FeatureID, Value)