Type: Package
Title: Distance Matrix Utilities
Version: 0.4.0
Description: Functions to re-arrange, extract, and work with distances.
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.1.1
Suggests: testthat, dplyr, tibble, tidyr
NeedsCompilation: no
Packaged: 2020-03-01 12:00:06 UTC; bittingerk
Author: Kyle Bittinger [aut, cre]
Maintainer: Kyle Bittinger <kylebittinger@gmail.com>
Repository: CRAN
Date/Publication: 2020-03-01 21:30:03 UTC

usedist: a package for working with distance matrices in R

Description

In usedist, we provide a number of functions to help with distance matrix objects, such as those produced by the dist function. Some functions are geared towards making or altering distance matrix objects. Others relate to groups of items in the distance matrix. They provide access to within- or between-group distances, or use these distances to infer the distance to group centroids.


Compute the distance between group centroids

Description

Compute the distance between group centroids

Usage

dist_between_centroids(d, idx1, idx2, squared = FALSE)

Arguments

d

A distance matrix object of class dist.

idx1

A vector of items in group 1.

idx2

A vector of items in group 2.

squared

If TRUE, return the squared distance between centroids.

Details

If you have a distance matrix, and the objects are partitioned into groups, you might like to know the distance between the group centroids. The centroid of each group is simply the center of mass for the group.

It is possible to infer the distance between group centroids directly from the distances between items in each group. The adonis test in the ecology package vegan takes advantage of this approach to carry out an ANOVA-like test on distances.

The approach rests on the assumption that the objects occupy some high-dimensional Euclidean space. However, we do not have to actually create the space to find the distance between centroids. Based on the assumption that such a space exists, we can use an algebraic formula to perform the computation.

The formulas for this were presented by Apostol and Mnatsakanian in 2003, though we need to re-arrange equation 28 in their paper to get the value we want:

| c_1 - c_2 | = \sqrt{ \frac{1}{n_1 n_2} \sum_{(1,2)} - \frac{1}{n_1^2} \sum_{(1)} - \frac{1}{n_2^2} \sum_{(2)}},

where n_1 is the number of samples in group 1, \sum_{(1)} is the sum of squared distances between items in group 1, and \sum_{(1,2)} is the sum of squared distances between items in group 1 and those in group 2.

Sometimes, the distance between centroids is not a real number, because it is not possible to create a space where this distance exists. Mathematically, we get a negative number underneath the square root in the equation above. If this happens, the function returns NaN. If you'd like to have access to this value, you can set squared = TRUE to return the squared distance between centroids. In this case, you will never get NaN, but you might receive negative numbers in your result.

Value

The distance between group centroids (see details).

References

Apostol, T.M. and Mnatsakanian, M.A. Sums of squares of distances in m-space. Math. Assoc. Am. Monthly 110, 516 (2003).


Retrieve distances from a dist object.

Description

Retrieve distances from a dist object.

Usage

dist_get(d, idx1, idx2)

Arguments

d

A distance matrix object of class dist.

idx1, idx2

Indices specifying the distances to extract.

Value

A vector of distances.

Examples

m4 <- matrix(1:16, nrow=4, dimnames=list(LETTERS[1:4]))
dm4 <- dist(m4)
dist_get(dm4, "A", "C")
dist_get(dm4, "A", c("A", "B", "C", "D"))
dist_get(dm4, c("A", "B", "C"), c("B", "D", "B"))

Create a data frame of distances between groups of items.

Description

Create a data frame of distances between groups of items.

Usage

dist_groups(d, g)

Arguments

d

A distance matrix object of class dist.

g

A factor representing the groups of objects in d.

Value

A data frame with 6 columns:

Item1, Item2

The items being compared.

Group1, Group2

The groups to which the items belong.

Label

A convenient label for plotting or comparison.

Distance

The distance between Item1 and Item2.

Examples

m4 <- matrix(1:16, nrow=4, dimnames=list(LETTERS[1:4]))
dm4 <- dist(m4)
g4 <- rep(c("Control", "Treatment"), each=2)
dist_groups(dm4, g4)

Make a distance matrix using a custom distance function

Description

Make a distance matrix using a custom distance function

Usage

dist_make(x, distance_fcn, ...)

Arguments

x

A matrix of observations, one per row

distance_fcn

A function used to compute the distance between two rows of the data matrix. The two rows will be passed as the first and second arguments to distance_fcn.

...

Additional arguments passed to distance_fcn.

Details

We do not set the call or method attributes of the dist object.

Value

A dist object containing the distances between rows of the data matrix.

Examples

x <- matrix(sin(1:30), nrow=5)
rownames(x) <- LETTERS[1:5]
manhattan_distance <- function (v1, v2) sum(abs(v1 - v2))
dist_make(x, manhattan_distance)

Make a new distance matrix of centroid distances between multiple groups

Description

Make a new distance matrix of centroid distances between multiple groups

Usage

dist_multi_centroids(d, g, squared = FALSE)

Arguments

d

A distance matrix object of class dist.

g

A factor representing the groups of items in d.

squared

If TRUE, return the squared distance between centroids.

Value

A distance matrix of distances between the group centroids.


Set the names/labels of a dist object.

Description

Set the names/labels of a dist object.

Usage

dist_setNames(d, nm)

Arguments

d

A distance matrix object of class dist.

nm

New labels for the rows/columns.

Value

A distance matrix with new row/column labels.

Examples

m4 <- matrix(1:16, nrow=4, dimnames=list(LETTERS[1:4]))
dm4 <- dist(m4)
dist_setNames(dm4, LETTERS[9:12])

Extract parts of a dist object.

Description

Extract a subset of values from a distance matrix. This function also works to re-arrange the rows of a distance matrix, if they are provided in the desired order.

Usage

dist_subset(d, idx)

Arguments

d

A distance matrix object of class dist.

idx

Indices specifying the subset of distances to extract.

Value

A distance matrix.

Examples

m4 <- matrix(1:16, nrow=4, dimnames=list(LETTERS[1:4]))
dm4 <- dist(m4)
dist_subset(dm4, c("A", "B", "C"))
dist_subset(dm4, c("D", "C", "B", "A"))

Compute distances from each item to group centroids

Description

Compute distances from each item to group centroids

Usage

dist_to_centroids(d, g, squared = FALSE)

Arguments

d

A distance matrix object of class dist.

g

A factor representing the groups of items in d.

squared

If TRUE, return the squared distance to group centroids.

Details

This function computes the distance from each item to the centroid positions of groups defined in the argument g. This is accomplished without determining the centroid positions directly; see the documentation for dist_between_centroids for details on this procedure.

If the distance can't be represented in a Euclidean space, the CentroidDistance is set to NaN. See the documentation for dist_between_centroids for further details.

Value

A data frame with distances to the group centroids:

Item

A character vector of item labels from the dist object, or an integer vector of item locations if labels are not present.

CentroidGroup

The group for which the centroid distance is given. The column type should match that of the argument g (the unique function is used to generate this column).

CentroidDistance

Inferred distance from the item to the centroid position of the indicated group.


Convert a data frame in long format to a numeric matrix

Description

Convert a data frame in long format to a numeric matrix

Usage

pivot_to_numeric_matrix(data, obs_col, feature_col, value_col)

Arguments

data

A data frame with numerical values in long format.

obs_col

The column listing the observation, or row of the matrix.

feature_col

The column listing the feature, or column of the matrix.

value_col

The column listing the value, to be placed inside the matrix.

The parameters obs_col, feature_col, and value_col should be provided as bare column names. If any combination of row and column does not appear in the data frame, a zero will be entered in the resultant matrix.

This function requires the packages dplyr, tibble, and tidyr to be installed. If they are not installed, the function will generate an error, with a message to install the appropriate packages.

Examples

longdata <- data.frame(
  SampleID = paste0("Sample", c(1, 1, 1, 2, 2, 3, 3)),
  FeatureID = paste0("Feature", c(1, 2, 3, 1, 2, 2, 3)),
  Value = c(132, 41, 7, 56, 11, 929, 83))
longdata
pivot_to_numeric_matrix(longdata, SampleID, FeatureID, Value)