Type: | Package |
Title: | PARAFAC Analysis of Fluorescence Excitation-Emission Matrices |
Version: | 0.3-8 |
Depends: | R (≥ 3.3) |
Imports: | multiway (≥ 1.0-4), CMLS, pracma, lattice, parallel, Matrix |
Enhances: | eemR, EEM |
Description: | Perform parallel factor analysis (PARAFAC: Hitchcock, 1927) <doi:10.1002/sapm192761164> on fluorescence excitation-emission matrices: handle scattering signal and inner filter effect, scale the dataset, fit the model; perform split-half validation or jack-knifing. Modified approaches such as Whittaker interpolation, randomised split-half, and fluorescence and scattering model estimation are also available. The package has a low dependency footprint and has been tested on a wide range of R versions. |
License: | GPL (≥ 3) |
BuildResaveData: | no |
BuildManual: | yes |
NeedsCompilation: | no |
Packaged: | 2024-05-07 11:48:43 UTC; ivan |
Author: | Ivan Krylov [aut, cre], Timur Labutin [ths], Anastasia Drozdova [rev] |
Maintainer: | Ivan Krylov <ikrylov@laser.chem.msu.ru> |
Repository: | CRAN |
Date/Publication: | 2024-05-07 15:50:02 UTC |
PARAFAC Analysis of Fluorescence Excitation-Emission Matrices
Description
Day after day, day after day,
We stuck, nor breath nor motion;
As idle as a painted ship
Upon a painted ocean.
Water, water, every where,
And all the boards did shrink;
Water, water, every where,
Nor any drop to drink.
– Samuel Taylor Coleridge, The Rime of the Ancient Mariner
Perform parallel factor analysis (PARAFAC: Hitchcock, 1927) <doi:10.1002/sapm192761164> on fluorescence excitation-emission matrices: handle scattering signal and inner filter effect, scale the dataset, fit the model; perform split-half validation or jack-knifing. Modified approaches such as Whittaker interpolation, randomised split-half, and fluorescence and scattering model estimation are also available. The package has a low dependency footprint and has been tested on a wide range of R versions.
Details
In order to work with your data, create feem
and/or
feemcube
objects from files or matrix or array objects.
Use feemlist
to import files in bulk. If your files
aren't in one of the formats supported by feem
but you
can read their contents by other means, you can supply an importer
function to feemlist
; it should take a file name and
return the corresponding feem
object.
Operations that can be performed on the objects include plotting
(plot.feem
), calculation of fluorescence indices
(feemindex
), inner-filter effect correction
(feemife
), handling of scattering signal
(feemscatter
), changing the wavelength grid of the data
by means of interpolation (feemgrid
), and scaling
(feemscale
). Scaling may be automatically undone after
performing the PARAFAC decomposition so that the resulting scores
would correspond to the data as it was before the scaling.
All processing functions can take individual feem
objects, lists of them, or feemcube
objects and return
values of the appropriate kind. For example, feemscatter
always returns an object of the same class but with the scattering
signal handled, while feemindex
returns named numeric
vectors for individual feem
s but
data.frame
s for collections of them. There's a
slight memory benefit to using lists of feem
objects,
but the difference shouldn't be noticeable, so there's nothing to
worry about if you started with a feemcube
.
In order to compute PARAFAC, you need to convert your data into a
feemcube
. Whether you perform jack-knifing, split-half
analysis, or PARAFAC itself, a copy of the data cube is kept together
with the results and can be extracted back using the
feemcube
function. The resulting objects support a
plot
method (described in the same help page) and can give you
the data as a few-column data.frame
using the
coef
method.
Once the analysis is finished, the PARAFAC model can be exported for
the OpenFluor database (write.openfluor
) or stored as an
R object using standard R tools (save
or
saveRDS
).
Index of help topics:
[.feem Extract or replace parts of FEEM objects [.feemcube Extract or replace parts of FEEM cubes absindex Functions of absorbance data albatross-package PARAFAC Analysis of Fluorescence Excitation-Emission Matrices as.data.frame.feem Transform a FEEM object into a data.frame feem Create a fluorescence excitation-emission matrix object feemcorcondia Core consistency diagnostic for PARAFAC models feemcube Data cubes of fluorescence excitation-emission matrices feemflame Fluorescence and scAttering Model Estimation feemgrid Interpolate FEEMs on a given wavelength grid feemife Absorbance-based inner filter effect correction feemindex Fluorescence indices and peak values feemjackknife Jack-knife outlier detection in PARAFAC models feemlist Create lists of FEEM objects feemparafac Compute PARAFAC on a FEEM cube object and access the results feems Synthetic fluorescence excitation-emission matrices and absorbance spectra feemscale Rescale FEEM spectra to a given norm and remember the scale factor feemscatter Handle scattering signal in FEEMs feemsplithalf Split-half analysis of PARAFAC models marine.colours Perceptually uniform palettes plot.feem Plot a FEEM object write.openfluor Export a PARAFAC model for the OpenFluor database
Author(s)
Ivan Krylov [aut, cre], Timur Labutin [ths], Anastasia Drozdova [rev]
References
Murphy KR, Stedmon CA, Graeber D, Bro R (2013). “Fluorescence spectroscopy and multi-way techniques. PARAFAC.” Analytical Methods, 5, 6557-6566. doi:10.1039/c3ay41160e.
Pucher M, Wünsch U, Weigelhofer G, Murphy K, Hein T, Graeber D (2019). “staRdom: Versatile Software for Analyzing Spectroscopic Data of Dissolved Organic Matter in R.” Water, 11(11), 2366. doi:10.3390/w11112366.
Cleese J, Jones T (1970). “Albatross: Flavours of different sea birds.” Journal of Flying Circus, 1.13, 7:05-7:45.
Krylov I, Drozdova A, Labutin T (2020). “Albatross R package to study PARAFAC components of DOM fluorescence from mixing zones of arctic shelf seas.” Chemometrics and Intelligent Laboratory Systems, 207(104176). doi:10.1016/j.chemolab.2020.104176.
See Also
feem
, feemlist
, feemindex
,
feemife
, feemscatter
,
feemgrid
, feemcube
,
feemscale
, feemsplithalf
,
feemparafac
, feemjackknife
,
feemflame
, absindex
.
Examples
data(feems)
dataset <- feemcube(feems, FALSE)
dataset <- feemscatter(dataset, c(24, 15, 10), 'whittaker')
dataset <- feemife(dataset, absorp)
plot(dataset <- feemscale(dataset, na.rm = TRUE))
# takes a long time
(sh <- feemsplithalf(
feemscatter(cube, c(24, 15)),
# in real life, set a stricter stopping criterion, 1e-6..1e-8
nfac = 2:4, splits = 4, ctol = 1e-4
))
plot(sh)
pf <- feemparafac(cube, nfac = 3, ctol = 1e-4)
plot(pf)
Extract or replace parts of FEEM objects
Description
Extract or replace parts of FEEM spectra. Returns FEEM objects unless dimensions should be dropped. When assigning from a FEEM object, requires wavelengths to match and warns if scale factors differ.
Usage
## S3 method for class 'feem'
x[i, j, drop = TRUE]
## S3 replacement method for class 'feem'
x[i, j] <- value
Arguments
x |
A FEEM object. |
i , j |
Row and column indices, respectively. As in usual R subsetting (see Extract), may be integer, logical or character vectors, or missing. |
drop |
Coerce result to the lowest possible dimension (dropping the
|
value |
An array-like object to assign values from. When assigning from FEEM objects, wavelengths are required to match and warnings are issued if scale factors don't match. Use vector subsetting (zero or one argument inside the brackets) to disable the check. |
Value
For [
: If drop
is TRUE
and at least one of the
index arguments chooses only one element along its axis, a named
numeric vector. Otherwise, a FEEM object.
For [<-
: a FEEM object.
See Also
Examples
(z <- feem(matrix(1:40, ncol = 8), 66 + 1:5, 99 + 1:8, 3))
str(z[1:4, 1:2])
str(z[1,, drop = TRUE])
z[2:3, 4:5] <- feem(matrix(1:4, 2), 66 + 2:3, 99 + 4:5, 3)
z
Extract or replace parts of FEEM cubes
Description
Extract or replace single intensities, vectors of them, whole FEEM spectra or even data cubes or their parts from a FEEM cube.
Usage
## S3 method for class 'feemcube'
x[i, j, k, drop = TRUE]
## S3 replacement method for class 'feemcube'
x[i, j, k] <- value
Arguments
x |
A FEEM cube object. |
i , j , k |
Row, column and sample indices, respectively. As usual, may be integer, logical or character vectors. Omitting a parameter results in choosing the whole axis. |
drop |
Coerce result to the lowest possible dimension (dropping the
|
value |
An array-like object to assign values from. When assigning from FEEM or FEEM cube objects, wavelengths are required to match and warnings are issued if scale factors don't match. Use vector subsetting (zero or one argument inside the brackets) to disable the check. |
Value
For [
: If choosing multiple values along each axis or
drop
is FALSE
, a FEEM cube object. If choosing only one
sample but multiple wavelengths, a FEEM object. Otherwise, a named
numeric matrix or vector, depending on the dimensions chosen.
For [<-
: a FEEM cube object.
See Also
Examples
z <- feemcube(array(1:385, c(5, 7, 11)), 1:5, 1:7, 1:11)
str(z[1:4, 1:2, 1:2])
z[2:3, 4:5, 3] <- feem(matrix(1:4, 2), 2:3, 4:5, 3)
z[,,3]
Functions of absorbance data
Description
Calculate absorption coefficients and/or absorbance data at given wavelengths, spectral slopes, and their ratios.
Usage
absindex(
x, abs.path, unit = c("log10", "m^-1"), out.A = 254,
out.a = c(350, 355, 374, 443),
out.a.ratio = list(c(250, 365), c(465, 665)),
out.slope = list(c(275, 295), c(350, 400)),
out.slope.ratio = list(c(275, 295, 350, 400)),
out.slope.nrmse = FALSE
)
Arguments
x |
Absorption data, either a |
abs.path |
A numeric vector of optical path lengths for every spectrum, in
centimetres. Defaults to |
unit |
Specifies whether |
out.A |
Return absorbance values at the wavelengths given as a numeric vector. |
out.a |
Return absorption coefficients at the wavelengths given as a numeric vector. |
out.a.ratio |
Return ratios of absorption coefficients at the wavelengths given as
a list of two-element numeric vectors. For every pair of
wavelengths, |
out.slope |
Return spectral slopes at wavelength ranges given as a list of
two-element numeric vectors. See the |
out.slope.ratio |
Return ratios of spectral slopes for pairs of wavelength ranges
given as a list of four-element numeric vectors. For every list
element, the value returned is |
out.slope.nrmse |
When computing slopes, also return the root-mean-square error for
the models providing them, divided by the range of the response:
|
Details
Currently, the spectral slopes are calculated by fitting a linear
model
\ln \alpha = b_0 - b_1 \lambda
and returning b_1
as the slope. See
(Twardowski, Boss, Sullivan, and Donaghay 2004) for a discussion of the calculation methods for
spectral slopes.
Requested wavelengths missing from the original grid are interpolated
using spline
. NA
values are returned
outside the original wavelength range.
Value
A data.frame
with one row per sample, containing
the following columns:
sample |
Names or numbers of the samples. |
A.<wavelength> |
Absorbance values, for every wavelength in |
a.<wavelength> |
Absorption coefficients, for every wavelength in |
aR.<wl[1]>.<wl[2]> |
Ratios of absorption coefficients, for every wl in
|
S.<wl[1]>.<wl[2]> |
Spectral slopes, for every wl in |
NRMSE.S.<wl[1]>.<wl[2]> |
If |
SR.<wl[1]>.<wl[2]>.<wl[3]>.<wl[4]> |
Ratios of spectral slopes, for every wl in
|
References
Twardowski MS, Boss E, Sullivan JM, Donaghay PL (2004). “Modeling the spectral shape of absorption by chromophoric dissolved organic matter.” Marine Chemistry, 89(1), 69-88. doi:10.1016/j.marchem.2004.02.008.
See Also
Examples
data(feems)
absindex(absorp)
Transform a FEEM object into a data.frame
Description
Transform a FEEM object from its matrix form accompanied by vectors of
wavelengths into a three-column form consisting of
(\lambda_\mathrm{em}, \lambda_\mathrm{ex}, I)
tuples, which could be useful for
export or plotting with lattice or ggplot2.
Usage
## S3 method for class 'feem'
as.data.frame(x, row.names = NULL, optional = FALSE, ...)
## S3 method for class 'feemcube'
as.data.frame(x, ...)
Arguments
x |
A FEEM object, or a FEEM cube object. |
row.names |
Passed to |
optional |
This option is required for compatibility with
|
... |
Passed as-is to |
Details
Rows where intensity is NA
are omitted from the output.
Value
A data.frame
containing three numeric columns:
emission |
Emission wavelength, nm. |
excitation |
Excitation wavelength, nm. |
intensity |
Fluorescence intensity at
|
sample |
For FEEM cube objects, the unique name of the sample possessing this
tuple of values, a factor. If the original object didn't have any
names, sequential integers are used instead. If the original object
had non-unique names, sequence numbers are appended to them using
|
See Also
Examples
z <- feem(matrix(1:42, nrow = 7), 1:7, 1:6)
head(as.data.frame(z))
Implementation notes for constrained matrix factorisation
Description
- cmf
-
Compute a low-rank matrix factorisation
\min_{\mathbf A, \mathbf B} || (\mathbf X - \mathbf A \mathbf{B}^\top ) \circ \mathbf W ||_\mathrm F
subject to weights\mathbf W
(set to0
where\mathbf X
is not defined) and constraints on rows of\mathbf{A}, \mathbf{B}
. - wcmls
-
Solve the weighted multivariate least squares problem
\min_\mathbf{B} || (\mathbf X - \mathbf A \mathbf{B}^\top) \circ \mathbf W ||_\mathrm F
subject to constraints on rows of\mathbf B
.
This is not a public interface. Subject to change without further notice. Please do not call from outside albatross.
Usage
cmf(
X, nfac = 1,
const = list(list(const = "nonneg"), list(const = "nonneg")),
start = c("svd", "random"), ctol = 1e-04, maxit = 10
)
## S3 method for class 'cmf'
fitted(object, ...)
wcmls(X, A, W, ..., struc = NULL)
Arguments
X |
The matrix for a low-rank approximation. |
nfac |
The rank of the factorisation; the number of columns in matrices
|
const |
Constraints on the two matrices: a list of two lists of arguments to
pass to |
start |
A
|
ctol |
Given |
maxit |
Iteration number limit. |
object |
An object of class |
A |
The predictor matrix in the weighted multivariate least squares problem. |
W |
The weights matrix. |
... , struc |
|
Details
The CMLS package function cmls
can solve
constrained multivariate least squares problems of the form:
\min_\mathbf{B} || \mathbf X - \mathbf A \mathbf B ||_\mathrm F
= L(\mathbf X, \mathbf A, \mathbf B)
We use it to solve a weighted problem. Let
\mathbf X, \mathbf W
be
(m \times n)
matrices,
\mathbf A
be an
(m \times k)
matrix,
\mathbf B
be an
(n \times k)
matrix,
\mathbf{J}_{p,q}
be a
(p \times q)
matrix of ones:
\min_\mathbf{B}
|| \mathbf W \circ (\mathbf X - \mathbf A \mathbf B^\top) ||_\mathrm F
= \sum_{i,j} (
w_{i,j} x_{i,j} - w_{i,j} \mathbf{a}_{i,\cdot} \mathbf{b}_{j,\cdot}^\top
)^2
= {}
{} = \sum_j ||
\mathbf{w}_{\cdot,j} \circ \mathbf{x}_{\cdot,j} - (
(\mathbf{w}_{\cdot,j} \mathbf{J}_{1,k}) \circ \mathbf A
) \mathbf{b}_{j,\cdot}^\top
||_\mathrm F
= \sum_j L(
\mathbf{w}_{\cdot,j} \circ \mathbf{x}_{\cdot,j},
(\mathbf{w}_{\cdot,j} \mathbf{J}_{1,k}) \circ \mathbf A,
\mathbf{b}_{j,\cdot}^\top
)
Here, \mathbf{w}_{\cdot,j}
and \mathbf{x}_{\cdot,j}
are columns of \mathbf W
and
\mathbf X
, while
\mathbf{a}_{i,\cdot}
and
\mathbf{b}_{j,\cdot}
are
rows of \mathbf A
and \mathbf B
,
respectively. Thus, in the weighted case, the
\mathbf B
matrix is determined row by row by
calling the cmls
function for pre-processed
\mathbf A
matrix and columns of
\mathbf X
.
The problem we're actually interested in is a low-rank approximation
of \mathbf X
. It doesn't have a unique solution,
especially if the rank is more than 1
, unless we apply
constraints and some luck. We solve it by starting with (typically)
SVD and refining the solution with alternating least squares until it
satisfies the constraints:
\min_\mathbf{B}
|| (\mathbf X - \mathbf A \mathbf{B}^\top) \circ \mathbf W ||_\mathrm F
and
\min_\mathbf{A} ||
(\mathbf{X}^\top - \mathbf B \mathbf{A}^\top) \circ \mathbf{W}^\top
||_\mathrm F
.
Value
- cmf
-
An list of class
cmf
containing the\mathbf A, \mathbf B
matrices. - wcmls
-
The
\mathbf B
matrix solving the constrained weighted multivariate least squares problem. - fitted.cmf
-
A matrix reconstructed from its
nfac
-rank decomposition.
References
de Juan A, Jaumot J, Tauler R (2014). “Multivariate Curve Resolution (MCR). Solving the mixture analysis problem.” Anal. Methods, 6, 4964-4976. doi:10.1039/c4ay00571f.
See Also
Examples
data(feems)
z <- feemscatter(feems$a, rep(25, 4), 'omit')
str(zf <- albatross:::cmf(unclass(z)))
str(albatross:::fitted.cmf(zf))
Create a fluorescence excitation-emission matrix object
Description
Functions to create fluorescence excitation-emission matrix objects from
R matrices coupled with excitation and emission wavelengths, three-column
data.frame
s containing
(\lambda_\mathrm{em}, \lambda_\mathrm{ex}, I)
tuples or files.
Usage
feem(x, ...)
## S3 method for class 'matrix'
feem(x, emission, excitation, scale = 1, ...)
## S3 method for class 'data.frame'
feem(
x, scale = 1, emission = 'emission',
excitation = 'excitation', intensity = 'intensity', ...
)
## S3 method for class 'character'
feem(x, format, ...)
## S3 method for class 'connection'
feem(x, format, ...)
Arguments
x |
The source of the information to create a FEEM object from: a matrix,
a three-column If converting a matrix, its rows should correspond to different
fluorescence emission wavelengths specified in the If converting a If reading a single file by file path or connection, the |
emission |
If converting a matrix, this should be a vector of emission wavelengths, each wavelength corresponding to a row of the matrix. If converting a |
excitation |
If converting a matrix, this should be a vector of excitation wavelengths, each wavelength corresponding to a column of the matrix. If converting a |
intensity |
If converting a |
scale |
The scale value of a EEM is preserved through the analysis procedure to divide the resulting score values after running PARAFAC. If the EEM has been pre-multiplied prior to creating the FEEM object, you can set the multiplier here. |
format |
|
... |
When converting matrices and When reading the FEEM from a file, additional arguments may be passed
to |
Details
Transposing a feem
object using t
will
remove the class attribute, returning an ordinary matrix.
Value
A FEEM object is a matrix with the following attributes added:
emission |
Fluorescence emission wavelengths corresponding to the rows of the matrix, nm. |
excitation |
Fluorescence excitation wavelengths corresponding to the columns of the matrix, nm. |
dimnames |
Dimension names, copies of information above. Used only for presentation purposes. |
scale |
Scale factor, preserved through the analysis, which may be used
later to undo the scaling. Initially |
See Also
FEEM methods: plot.feem
, as.data.frame.feem
,
[.feem
, feemgrid
, feemife
,
feemscale
, feemscatter
.
Examples
feem(matrix(1:40, ncol = 8), 1:5, 1:8)
feem(
data.frame(x = 1:10, y = 21:30, z = 31:40),
emission = 'x', excitation = 'y', intensity = 'z'
)
feem(
system.file('extdata/ho_aq.csv', package = 'albatross'),
'table', sep = ','
)
feem(
system.file('extdata/F900.txt', package = 'albatross'), 'F900txt'
)
Core consistency diagnostic for PARAFAC models
Description
Compute the core consistency diagnostic (“CORCONDIA”) by fitting a “Tucker3” core array to the existing PARAFAC loadings.
Usage
feemcorcondia(
model, divisor = c("nfac", "core"),
kind = c('pinv', 'iterative', 'vec'), ...
)
## S3 method for class 'feemcorcondia'
print(x, ...)
Arguments
model |
A PARAFAC model returned by |
divisor |
The divisor used in computation of the CORCONDIA value, see Details. |
kind |
A string choosing the method used to compute the least squares Tucker3 core. Defaults to “pinv” for PARAFAC models without missing data and “iterative” for models where missing data is present. See Details. |
x |
An object returned by |
... |
|
Details
The “Tucker3” model uses three loading matrices and a small three-way “core array” to describe a larger three-way array:
X_{i,j,k} = \sum_r \sum_s \sum_t A_{i,r} B_{j,s} C_{k,t} G_{r,s,t}
It's easy to show that constraining
G_{r,s,t} = 1_{r = s = t}
makes the Tucker3 model equivalent to a PARAFAC model. The core
consistency diagnostic works by constraining the loading matrices of a
Tucker3 model to the existing loading matrices from a PARAFAC model
and estimating the core array. The closer the resulting
\mathbf{G}
tensor is to a diagonal one, the
better.
Given the least-squares estimated core tensor
\mathbf{G}
, the ideal core tensor
T_{r,s,t} = 1_{r = s = t}
and the denominator
D
, the CORCONDIA metric is defined as follows:
\left(
1 - \frac{
\sum_r \sum_s \sum_t (G_{r,s,t} - T_{r,s,t})^2
}{D}
\right) \cdot 100\%
The denominator can be chosen to be either
\sum_r \sum_s \sum_t T_{r,s,t}^2
, which is
equal to the number of factors in the model (divisor = 'nfac'
),
or
\sum_r \sum_s \sum_t G_{r,s,t}^2
, which will
avoid resulting negative values (divisor = 'core'
).
There are multiple ways how the least squares Tucker3 core can be
computed. When no data is missing, the matricised form of the model
can be used to derive the expression (kind = 'pinv'
, the
default in such cases):
\mathbf{X} =
\mathbf{A} \mathbf{G} (\mathbf{C} \otimes \mathbf{B})^\top
+ \epsilon
\hat{\mathbf{G}} =
\mathbf{A}^{+} \mathbf{X} (
(\mathbf{C}^\top)^{+} \otimes
(\mathbf{B}^\top)^{+}
)
With missing data present, a binary matrix of weights
\mathbf{W}
appears:
\min_\mathbf{G} \left\| \mathbf W \circ (
\mathbf{A} \mathbf{G} (\mathbf{C} \otimes \mathbf{B})^\top
- \mathbf{X}
) \right\|^2
A gradient-based method can be used to solve this problem iteratively
without allocating too much memory, but care must be taken to ensure
convergence. For kind = 'iterative'
(which is the default for
models with missing data), optim
is used with
parameters method = 'L-BFGS-B'
,
control = list(maxit = 1000, factr = 1)
. Warnings will be
produced if the method doesn't indicate successful convergence.
The problem can also be solved exactly by unfolding the tensor into a vector and skipping the elements marked as missing:
\min_\mathbf{G} \left\|
\mbox{vec}(
\mathbf{A} \mathbf{G} (\mathbf{C} \otimes \mathbf{B})^\top
)_{\mbox{non-missing}}
- \mbox{vec}(\mathbf{X})_{\mbox{non-missing}}
\right\|^2
\mbox{vec}(\mathbf{C} \otimes \mathbf{B} \otimes \mathbf{A})
_{\mbox{non-missing}}
\times \mbox{vec}\:\mathbf{G}
= \mbox{vec}(\mathbf{X})_{\mbox{non-missing}}
Unfortunately, when this method is used (kind = 'vec'
), the
left-hand side of the equation has the size of
\mathbf{X}
times the number of components cubed,
which grows very quickly for models with large numbers of components.
Value
A numeric scalar of class feemcorcondia
with the following
attributes:
divisor |
The |
core |
A three-way array containing the least-squares Tucker core for the given PARAFAC model. |
References
Bro R, Kiers HAL (2003). “A new efficient method for determining the number of components in PARAFAC models.” Journal of Chemometrics, 17(5), 274-286. ISSN 0886-9383, doi:10.1002/cem.801.
See Also
multiway::corcondia
Examples
data(feems)
cube <- feemscale(feemscatter(cube, c(20, 14)), na.rm = TRUE)
# kind = 'vec' is exact but may take a lot of memory
feemcorcondia(feemparafac(cube, nfac = 3, ctol = 1e-4), kind = 'vec')
# kind = 'iterative' used by default for models with missing data
feemcorcondia(feemparafac(cube, nfac = 4, ctol = 1e-4))
Data cubes of fluorescence excitation-emission matrices
Description
Given a list of feem
objects or a 3-way array, build
tagged 3-dimensional arrays of fluorescence excitation-emission
spectra. Extract the data cube from the corresponding model objects.
Transform the data cube into a list of feem
objects.
Usage
feemcube(x, ...)
## S3 method for class 'list'
feemcube(x, all.wavelengths, ...)
## S3 method for class 'array'
feemcube(x, emission, excitation, scales, names = NULL, ...)
## S3 method for class 'feemparafac'
feemcube(x, ...)
## S3 method for class 'feemsplithalf'
feemcube(x, ...)
## S3 method for class 'feemjackknife'
feemcube(x, ...)
## S3 method for class 'feemflame'
feemcube(x, ...)
## S3 method for class 'feemcube'
as.list(x, ...)
Arguments
x |
|
all.wavelengths |
Logical, a flag specifying whether to include wavelengths not present
in all of the samples. If |
emission |
Numeric vector of emission wavelengths. Should correspond to the
first dimension of the array |
excitation |
Numeric vector of excitation wavelengths. Should correspond to the
second dimension of the array |
scales |
Numeric vector of scale factors corresponding to the spectra in the
array. Should correspond to the third dimension of the array
|
names |
Character vector of names of the samples. Should correspond to the
third dimension of the array |
... |
Additional arguments besides those specified above are not allowed. |
Details
feemcube.list
can be used to build FEEM data cubes from lists
of FEEM objects even if their wavelength grids do not exactly match.
The missing wavelengths may be set to NA
(all.wavelengths = TRUE
) or omitted from the cube
(all.wavelengths = FALSE
). See feemgrid
if you
need to adjust the wavelength grid of a list of EEMs before making it
into a FEEM cube.
feemcube.feemparafac
, feemcube.jackknife
, and
feemcube.feemsplithalf
return the data cube originally passed
to the corresponding functions.
Value
A FEEM data cube is a numeric three-dimensional array with the following attributes:
emission |
Fluorescence emission wavelengths corresponding to the first dimension of the array, nm. |
excitation |
Fluorescence excitation wavelengths corresponding to the second dimension of the array, nm. |
dimnames |
Dimension names, copies of information above. Used only for presentation purposes. |
scales |
Scale factors of the samples, corresponding to the third dimension
of the array. Assumed to be |
as.list.feemcube
: A named list of FEEM objects comprising
x
.
See Also
FEEM cube methods: [.feemcube
,
plot.feemcube
, as.data.frame.feemcube
,
feemife
, feemscale
,
feemscatter
.
Examples
# array form
feemcube(
array(1:24, c(4, 3, 2)), # 3-way array obtained elsewhere
seq(340, 400, len = 4), seq(250, 300, len = 3) # wavelengths
)
# list form
feemcube(
replicate(2, feem( # list of feem objects
matrix(1:6, 2), c(340, 400), c(250, 275, 300)
), FALSE),
TRUE
)
str(as.list(feemcube(array(1:60, 3:5), 1:3, 1:4)))
Fluorescence and scAttering Model Estimation
Description
Given a FEEM cube, model the fluorescence and the scattering signals at the same time as a sum of a PARAFAC model and a low-rank unfolded matrix factorisation.
Usage
feemflame(
X, ffac, sfac, maxiter = 32, widths = rep(25, 4), Raman.shift = 3400,
ctol = 1e-04, progress = TRUE, control.parafac, control.cmf
)
## S3 method for class 'feemflame'
fitted(object, ...)
## S3 method for class 'feemflame'
residuals(object, ...)
## S3 method for class 'feemflame'
coef(
object, type = c(
"fluorescence",
"scores", "loadings", "emission", "excitation", "samples",
"scattering", "sc.scores", "sc.loadings"
), ...
)
## S3 method for class 'feemflame'
plot(
x, type = c('both', 'fl.image', 'fl.lines'), ...
)
Arguments
X |
A |
ffac |
The number of trilinear components used to model fluorescence,
passed to |
sfac |
The number of bilinear (low-rank matrix factorisation) components used to model the scattering signal. |
maxiter |
Maximum number of alternating PARAFAC and constrained matrix factorisation iterations. |
widths |
Widths of the scattering regions, like in
|
Raman.shift |
Raman shift of the scattering signal, in
|
ctol |
Given |
progress |
Print progress information on the console, including the iteration number, relative sum of squared residuals, and relative change in sum of squared residuals. |
control.parafac , control.cmf |
Named lists of additional arguments to be passed to the underlying
functions. Both default to |
object , x |
A |
type |
|
... |
No other parameters are allowed. |
Details
FLAME models the input data as a sum of fluorescence signal (PARAFAC model) and scattering signal (low rank model):
X_k(\lambda^\mathrm{em}_i, \lambda^\mathrm{ex}_j) =
\underbrace{\sum_p A_{i,p} B_{j,p} C_{k,p}}_{\mbox{fluorescence}}
+ \underbrace{\sum_q S_{i,j,q} D_{k,q}}_{\mbox{scattering}}
The function alternates between fitting the PARAFAC model on the
dataset with scattering signal subtracted and fitting the low-rank
model on the dataset with fluorescence signal subtracted. The PARAFAC
model is fitted using the feemparafac
function. The
low-rank model is fitted by means of unfolding the wavelength
dimensions into one, resulting in a matrix, followed by the same
alternating least squares procedure as done in multivariate curve
resolution. Both models are constrained to result in non-negative
factors.
The low-rank model is additionally constrained to zero outside the
scattering region. The scattering region is defined the same way as in
feemscatter
, using the widths
and the
Raman.shift
arguments.
Initial PARAFAC model is fitted with the scattering region set to missing. The low-rank model is initialised with truncated singular value decomposition forced to be non-negative.
Value
feemflame |
An object of class
|
fitted.feemflame |
A |
residuals.feemparafac |
A |
coef.feemflame |
See the description of the |
Note
The structure of the feemflame
object, the initialisation, and
the constraints may be subject to change in a future version.
References
Tauler R, Marqués I, Casassas E (1998). “Multivariate curve resolution applied to three-way trilinear data: Study of a spectrofluorimetric acid-base titration of salicylic acid at three excitation wavelengths.” Journal of Chemometrics, 12(1), 55-75. doi:10.1002/(SICI)1099-128X(199801/02)12:1<55::AID-CEM501>3.0.CO;2-#.
Krylov I, Labutin T, Rinnan Å, Bro R (2021). “Modelling of scattering signal for direct PARAFAC decompositions of excitation-emission matrices.” 17th Scandinavian Symposium on Chemometrics. https://web.archive.org/web/20220314144225/https://ssc17.org/abstract/Krylov1.html. https://files.libs.chem.msu.ru/~ivan/SSC17/P13.pdf.
See Also
Examples
data(feems)
cube <- feemscale(cube)
factors <- feemflame(cube, ffac = 3, sfac = 1)
str(coef(factors))
str(coef(factors, 'scattering'))
plot(factors)
Interpolate FEEMs on a given wavelength grid
Description
Use interpolation to change the wavelength grid of a single FEEM or unify the grid of a collection of them.
Usage
feemgrid(x, ...)
## S3 method for class 'feem'
feemgrid(
x, emission, excitation,
method = c("whittaker", "loess", "kriging", "pchip"), ...
)
## S3 method for class 'feemcube'
feemgrid(
x, emission, excitation, ..., progress = TRUE
)
## S3 method for class 'list'
feemgrid(
x, emission, excitation, ..., progress = TRUE
)
Arguments
x |
|
emission , excitation |
Desired wavelength grid, as numeric vectors. Must be specified for a single FEEM. If not specified for a collection of FEEMs, all wavelengths falling in the range of the intersection all wavelengths intervals are chosen. |
method |
Interpolation method, see |
... |
Passed from generics to |
progress |
Set to |
Details
The algorithm doesn't know how to distinguish between NA
s
that haven't been measured and NA
s that resulted from
combining different wavelength grids, so it tries to interpolate all
of them. As a result, leaving large areas of the spectrum undefined
(e.g. anti-Stokes area) is not recommended, since it would result in
extrapolation and introduce strong artefacts.
Value
An object of the same kind (FEEM object / FEEM cube / list of them) with emission and excitation wavelengths as requested.
See Also
Examples
data(feems)
x <- feemscatter(feems$a, rep(25, 4))
y <- feemgrid(x, seq(240, 600, 5), seq(230, 550, 10))
plot(plot(x, main = 'Original' ), split = c(1, 1, 2, 1), more = TRUE)
plot(plot(y, main = 'Interpolated'), split = c(2, 1, 2, 1))
Absorbance-based inner filter effect correction
Description
Use absorbance data to correct inner-filter effect in FEEM objects and collections of them.
Usage
feemife(x, ...)
## S3 method for class 'feem'
feemife(x, absorbance, abs.path = 1, ...)
## S3 method for class 'feemcube'
feemife(x, absorbance, abs.path, ..., progress = FALSE)
## S3 method for class 'list'
feemife(x, absorbance, abs.path, ..., progress = FALSE)
Arguments
x |
A FEEM object, a FEEM data cube, or a list of them. |
absorbance |
If Otherwise, this could be a list of such objects or a multi-column
matrix-like object. If |
abs.path |
If Otherwise, a named vector containing the names from If not set, assumed to be |
progress |
Set to |
... |
No parameters besides those described above are allowed. |
Details
If you receive errors alleging that some names don't match, but
are absolutely sure that the absorbance spectra and path lengths are
present in the same order as in x
, remove the names from either
of the objects, e.g. by passing unname(absorbance)
.
The formula used to correct for inner filter effect is:
I_\mathrm{corr}(\lambda_\mathrm{em}, \lambda_\mathrm{ex}) =
I_\mathrm{orig}(\lambda_\mathrm{em}, \lambda_\mathrm{ex}) 10^{
\frac{A(\lambda_\mathrm{em}) + A(\lambda_\mathrm{ex})}{2 L_\mathrm{abs}}
}
Value
An object of the same kind as x
, with inner filter effect
corrected.
References
Lakowicz JR (2006). Principles of Fluorescence Spectroscopy, 3rd ed.. Springer US. doi:10.1007/978-0-387-46312-4.
Kothawala DN, Murphy KR, Stedmon CA, Weyhenmeyer GA, Tranvik LJ (2013). “Inner filter correction of dissolved organic matter fluorescence.” Limnology and Oceanography: Methods, 11(12), 616-630. doi:10.4319/lom.2013.11.616.
Examples
data(feems)
str(cube)
str(absorp)
plot(feemife(cube,absorp) / cube)
Fluorescence indices and peak values
Description
Calculate fluorescence indices or peak values for individual FEEMs or groups of them.
Usage
feemindex(x, ...)
## S3 method for class 'feem'
feemindex(
x,
indices = c(
"HIX", "BIX", "MFI", "CFI", "YFI", "FrI",
"A", "B", "C", "M", "P", "T"
),
tolerance = 1, interpolate = FALSE, ...
)
## S3 method for class 'feemcube'
feemindex(x, ..., progress = FALSE)
## S3 method for class 'list'
feemindex(x, ..., progress = FALSE)
Arguments
x |
A FEEM, a FEEM cube, or a list of |
indices |
Fluorescence indices or peaks to return. By default, all indices and peaks known to the function are returned. See Details for their meaning. |
tolerance |
A numeric scalar signifying the acceptable emission and excitation
wavelength error in nm. For example, if a wavelength of |
interpolate |
A string specifying an interpolation method (“whittaker”,
“loess”, “kriging”, “pchip”), or If interpolation is disabled, an index will get an When interpolation is enabled, required points that are missing from
the grid or present but set to |
... |
Additional parameters eventually passed to interpolation methods.
See |
progress |
Set to |
Details
Available indices and peaks are:
- HIX
-
\mathrm{HIX} = \frac{ \int_{435 \, \mathrm{nm}}^{480 \, \mathrm{nm}} I \, d\lambda_\mathrm{em} }{ \int_{300 \, \mathrm{nm}}^{345 \, \mathrm{nm}} I \, d\lambda_\mathrm{em} } \; \mathrm{at} \; \lambda_\mathrm{ex} = 254 \, \mathrm{nm}
Higher values of the humification index correspond to more condensed fluorescing molecules (higher C/H), more humified matter. (Zsolnay, Baigar, Jimenez, Steinweg, and Saccomandi 1999)
- BIX
-
\mathrm{BIX} = \frac{ I(\lambda_\mathrm{em} = 380 \, \mathrm{nm}) }{ I(\lambda_\mathrm{em} = 430 \, \mathrm{nm}) } \; \mathrm{at} \; \lambda_\mathrm{ex} = 310 \, \mathrm{nm}
Index of recent autochthonous contribution determines the presence of the
\beta
fluorophore, characteristic of autochthonous biological activity in water samples. (Huguet, Vacher, Relexans, Saubusse, Froidefond, and Parlanti 2009) - MFI
-
\mathrm{MFI} = \frac{ I(\lambda_\mathrm{em} = 450 \, \mathrm{nm}) }{ I(\lambda_\mathrm{em} = 500 \, \mathrm{nm}) } \; \mathrm{at} \; \lambda_\mathrm{ex} = 370 \, \mathrm{nm}
The fluorescence index by (McKnight, Boyer, Westerhoff, Doran, Kulbe, and Andersen 2001) helps distinguish sources of isolated aquatic fulvic acids and may indicate their aromaticity.
- CFI
-
\mathrm{CFI} = \frac{ I(\lambda_\mathrm{em} = 470 \, \mathrm{nm}) }{ I(\lambda_\mathrm{em} = 520 \, \mathrm{nm}) } \; \mathrm{at} \; \lambda_\mathrm{ex} = 370 \, \mathrm{nm}
The fluorescence index by (Cory and McKnight 2005) is correlated to relative contribution of microbial versus higher plant-derived organic matter to the DOM pool.
- YFI
-
\mathrm{YFI} = \frac{ \bar{I}(\lambda_\mathrm{em} \in [350, 400] \, \mathrm{nm}) }{ \bar{I}(\lambda_\mathrm{em} \in [400, 450] \, \mathrm{nm}) } \; \mathrm{at} \; \lambda_\mathrm{ex} = 280 \, \mathrm{nm}
Yeomin fluorescence index (Heo, Yoon, Kim, Lee, Lee, and Her 2016) is lowest for humic-like and fulvic-like samples, higher for aminosugar-like samples and highest for protein-like samples.
- FrI
-
\mathrm{FrI} = \frac{ I(\lambda_\mathrm{em} = 380 \, \mathrm{nm}) }{ \max I(\lambda_\mathrm{em} \in [420, 435] \, \mathrm{nm}) } \; \mathrm{at} \; \lambda_\mathrm{ex} = 310 \, \mathrm{nm}
The freshness index, also known as
\frac{\beta}{\alpha}
, is an indicator of autochthonous inputs (Wilson and Xenopoulos 2009) and may provide indication of relative contribution of microbially produced DOM. - A, B, C, M, P, T
-
Fluorophore peaks taken from (Coble 2007):
Peak \lambda_\mathrm{ex}
\lambda_\mathrm{em}
Fluorescence A 260 400-460 humic-like B 275 305 tyrosine-like C 320-360 420-460 humic-like M 290-310 370-410 marine humic-like P 398 660 pigment-like T 275 340 tryptophan-like When a range of wavelengths specified in one or both axes, the maximal signal value over that range is taken.
Positions of the peaks and the areas used to determine the
fluorescence indices of an EEM. The Rayleigh and Raman scattering
areas for both 1st and 2nd diffraction orders are shown in grey,
assuming a width of \pm 20
nm and a Raman
shift of 3400 \: \mathrm{cm}^{-1}
. The
tolerance interval of \pm 1
nm is invisible
at the scale of the figure.
Integration for HIX and YFI is done using the trapezoidal method:
\int_a^b f(x) dx \approx (b - a) \frac{f(a) + f(b)}{2}
Value
For individual feem
objects, a named numeric vector
containing the values requested via the indices
argument.
Otherwise, a data.frame
containing the values from
the vectors above and a column named sample
containing the
names of the samples (or numbers, if names were absent).
Author(s)
With edits and suggestions by Anastasia Drozdova.
References
Coble PG (2007). “Marine Optical Biogeochemistry: The Chemistry of Ocean Color.” Chemical Reviews, 107(2), 402-418. doi:10.1021/cr050350+.
Cory RM, McKnight DM (2005). “Fluorescence spectroscopy reveals ubiquitous presence of oxidized and reduced quinones in dissolved organic matter.” Environmental science & technology, 39(21), 8142-8149. doi:10.1021/es0506962.
Heo J, Yoon Y, Kim D, Lee H, Lee D, Her N (2016). “A new fluorescence index with a fluorescence excitation-emission matrix for dissolved organic matter (DOM) characterization.” Desalination and Water Treatment, 57(43), 20270-20282. doi:10.1080/19443994.2015.1110719.
Huguet A, Vacher L, Relexans S, Saubusse S, Froidefond JM, Parlanti E (2009). “Properties of fluorescent dissolved organic matter in the Gironde Estuary.” Organic Geochemistry, 40(6), 706-719. doi:10.1016/j.orggeochem.2009.03.002.
McKnight DM, Boyer EW, Westerhoff PK, Doran PT, Kulbe T, Andersen DT (2001). “Spectrofluorometric characterization of dissolved organic matter for indication of precursor organic material and aromaticity.” Limnology and Oceanography, 46(1), 38-48. doi:10.4319/lo.2001.46.1.0038.
Wilson HF, Xenopoulos MA (2009). “Effects of agricultural land use on the composition of fluvial dissolved organic matter.” Nature Geoscience, 2(1), 37-41. doi:10.1038/ngeo391.
Zsolnay A, Baigar E, Jimenez M, Steinweg B, Saccomandi F (1999). “Differentiating with fluorescence spectroscopy the sources of dissolved organic matter in soils subjected to drying.” Chemosphere, 38(1), 45-50. doi:10.1016/S0045-6535(98)00166-0.
See Also
Examples
data(feems)
x <- feemscatter(feems$a, rep(25, 4), 'omit')
feemindex(x)
feemindex(x, interpolate = 'whittaker')
feemindex(feems[2:3])
feemindex(feemcube(feems[4:5], TRUE))
Jack-knife outlier detection in PARAFAC models
Description
Perform leave-one-out fitting + validation of PARAFAC models on a given FEEM cube.
Usage
feemjackknife(cube, ..., progress = TRUE)
## S3 method for class 'feemjackknife'
plot(
x, kind = c('estimations', 'RIP', 'IMP'), ...
)
## S3 method for class 'feemjackknife'
coef(
object, kind = c('estimations', 'RIP', 'IMP'), ...
)
Arguments
cube |
A |
progress |
Set to |
x , object |
An object returned by |
kind |
Chooses what to plot (when called as
|
... |
|
Details
The function takes each sample out of the dataset, fits a PARAFAC model without it, then fits the outstanding sample to the model with emission and excitation factors fixed:
\hat{\mathbf{c}} =
(\mathbf{A} \ast \mathbf{B})^{+} \times \mathrm{vec}(\mathbf{X})
The individual leave-one-out models (fitted loadings
\mathbf A
, \mathbf B
and scores
\mathbf C
) are reordered according to best Tucker's
congruence coefficient match and rescaled by minimising
|| \mathbf A \, \mathrm{diag}(\mathbf s_\mathrm A) -
\mathbf A^\mathrm{orig} ||^2
and
|| \mathbf{B} \, \mathrm{diag}(\mathbf s_\mathrm B) -
\mathbf B^\mathrm{orig} ||^2
over \mathbf s_\mathrm A
and
\mathbf s_\mathrm B
, subject to
\mathrm{diag}(\mathbf s_\mathrm A) \times
\mathrm{diag}(\mathbf s_\mathrm B) \times
\mathrm{diag}(\mathbf s_\mathrm C) = \mathbf I
, to make them comparable.
Once the models are fitted, resample influence plots and identity match plots can be produced from resulting data to detect outliers.
To conserve memory, feemjackknife
puts the user-provided
cube
in an environment and passes it via envir
and
subset
options of feemparafac
. This means that,
unlike in feemparafac
, the cube
argument has
to be a feemcube
object and passing envir
and
subset
options to feemjackknife
is not supported. It
is recommended to fully name the parameters to be passed to
feemparafac
to avoid problems.
plot.feemjackknife
provides sane defaults for
xyplot
parameters xlab
, ylab
,
scales
, as.table
, but they can be overridden.
Value
- feemjackknife
-
A list of class
feemjackknife
containing the following entries:- overall
-
Result of fitting the overall
cube
withfeemparafac
. - leaveone
-
A list of length
dim(cube)[3]
containing the reduced dataset components. Everyfeemparafac
object in the list has an additionalChat
attribute containing the result of fitting the excluded spectrum back to the loadings of the reduced model.
- plot.feemjackknife
-
A lattice plot object. Its
print
orplot
method will draw the plot on an appropriate plotting device. - coef.feemjackknife
-
A
data.frame
containing various columns, depending on the value of thekind
argument:- estimations
-
- loading
Values of the loadings.
- mode
-
The axis of the loadings, “Emission” or “Excitation”.
- wavelength
-
Emission or excitation wavelength the loading values correspond to.
- factor
The component number.
- omitted
-
The sample (name if
cube
had names, integer if it didn't) that was omitted to get the resulting loading values.
- RIP
-
- msq.resid
-
Mean squared residual value for the model with a given sample omitted.
- Emission
-
Mean squared difference in emission mode loadings between the overall model and the model with a given sample omitted.
- Excitation
-
Mean squared difference in excitation mode loadings between the overall model and the model with a given sample omitted.
- omitted
-
The sample (name if
cube
had names, integer if it didn't) that was omitted from a given model.
- IMP
-
- score.overall
Score values for the overall model.
- score.predicted
-
Score values estimated from the loadings of the model missing a given sample.
- factor
The component number.
- omitted
-
The sample (name if
cube
had names, integer if it didn't) that was omitted from a given model.
References
Riu J, Bro R (2003). “Jack-knife technique for outlier detection and estimation of standard errors in PARAFAC models.” Chemometrics and Intelligent Laboratory Systems, 65(1), 35-49. doi:10.1016/S0169-7439(02)00090-4.
See Also
Examples
data(feems)
cube <- feemscale(feemscatter(cube, rep(14, 4)), na.rm = TRUE)
# takes a long time; the stopping criterion is weaked for speed
jk <- feemjackknife(cube, nfac = 3, ctol = 1e-4)
# feemparafac methods should be able to use the environment and subset
plot(jk$leaveone[[1]])
plot(jk)
plot(jk, 'IMP')
plot(jk, 'RIP')
head(coef(jk))
Create lists of FEEM objects
Description
Convert vectors of file names or objects from other packages (such as
eemR or EEM) into flat named lists of
feem
objects.
Usage
feemlist(x, ...)
## S3 method for class 'character'
feemlist(
x, format, pattern = NULL, recursive = TRUE, ignore.case = FALSE,
simplify.names = TRUE, progress = TRUE, ...
)
## S3 method for class 'eemlist'
feemlist(x, ...)
## S3 method for class 'EEM'
feemlist(x, ...)
Arguments
x |
A character vector containing names of files and directories to
import using Alternatively, an |
format |
Corresponds to the Alternatively, can be a function that takes a path to a file and
anything passed in |
pattern , recursive , ignore.case |
These options are passed to |
simplify.names |
If |
progress |
If true, show a text progress bar using
|
... |
When importing files, remaining options are passed to
|
Details
Names of x
are preserved; if x
is not named, names are
assigned from the values of x
itself, and so are empty names
in partially-named x
. Every directory in x
is replaced
with its contents (as returned by list.files
),
their names obtained by concatenating the name of the directory
element with their paths inside the directory (with
.Platform$file.sep
as a separator). For example,
when importing x = c('foo' = 'bar')
with directory ‘bar’
containing ‘baz.txt’, resulting name would be ‘foo/baz.txt’.
When importing many files from the same directory, the
simplify.names
option is useful to avoid duplication in resulting
names. For example,
feemlist('.', simplify.names = FALSE)
results in a list with
all names starting with ./
, while
feemlist('foo/bar/baz', simplify.names = TRUE)
(default) would
shave off all three common path components and the separators.
Mixing files and directories in x
will most likely not preserve
the order of the elements.
Note: Please don't rely on the name generation behaving exactly as specified as it may be changed in the future versions.
When importing custom file formats, the format
function should
typically take the following form:
function(filename, ...) { # read data from filename # take additional arguments passed from feemlist(...) if needed return(feem(data)) }
Value
A flat named list of feem
objects.
See Also
feem
; the packages eemR and EEM.
Examples
feemlist(
system.file('extdata/pano2.txt', package = 'albatross'),
'table', transpose = TRUE, na = 0
)
if (requireNamespace('eemR')) feemlist(eemR::eem_read(
system.file('extdata/ho_aq.csv', package = 'albatross'),
import_function='aqualog'
))
if (requireNamespace('EEM')) feemlist(EEM::readEEM(
system.file('extdata/ho_aq.dat', package = 'albatross')
))
feemlist(
system.file('extdata/custom.rds', package = 'albatross'),
readRDS
)
Compute PARAFAC on a FEEM cube object and access the results
Description
feemparafac
forwards its arguments to
parafac
from the multiway package,
optionally rescales the result and attaches a few attributes.
Resulting objects of class feemparafac
can be accessed using
methods presented below.
Usage
feemparafac(
X, ..., const = rep('nonneg', 3), ctol = 1e-6,
rescale = 3, retries = 10, subset = TRUE, envir = NULL
)
## S3 method for class 'feemparafac'
plot(x, type = c("image", "lines"), ...)
## S3 method for class 'feemparafac'
coef(
object, type = c(
"all", "scores", "loadings", "surfaces",
"emission", "excitation", "samples"
), ...
)
## S3 method for class 'feemparafac'
fitted(object, ...)
## S3 method for class 'feemparafac'
residuals(object, ...)
## S3 method for class 'feemparafac'
reorder(x, neworder, like, ...)
## S3 method for class 'feemparafac'
rescale(x, mode, newscale, absorb, like, ...)
Arguments
X |
A FEEM cube object. The per-sample factors will be multiplied by
If |
... |
|
const |
A character vector of length 3 specifying the constraints for all
modes of |
ctol |
The stopping criterion used by |
rescale |
Rescale the resulting factors to leave all the variance in the given
mode: emission, excitation, or sample (default). Set to |
retries |
Retry for given number of tries until
|
subset |
An integer or logical vector choosing the samples from |
envir |
An environment to look up |
x , object |
An object returned by |
type |
Given a fitted PARAFAC model:
With
Fitted PARAFAC coefficients can be returned in the following forms:
|
neworder |
A permutation of integers between |
like |
A In In |
mode |
The modes to rescale, with |
newscale |
The desired root-mean-square for each column of the modes being
rescaled.
Forwarded to |
absorb |
The mode that should absorb the inverse rescaling coefficients.
When |
Details
feemparafac
tries hard to guarantee the convergence flag to be
0
(normal convergence) or 1
(iteration number limit
reached), but never 2
(a problem with the constraints). A fatal
error is raised if repeated runs of parafac
do
not return a (semi-)successfully fitted model.
After the PARAFAC decomposition is calculated, the scores are
multiplied by the scales
attribute of the X
object,
making them represent the cube with scaling undone. Use
feemscale(remember = FALSE)
if you don't want to undo
the scaling.
The output
option is fixed to "best"
value. Obtaining
a list of alternative solutions can therefore be achieved by running:
replicate(n, feemparafac(..., nstart = 1), simplify = FALSE)
The subset
and envir
options are useful to repeatedly
perform PARAFAC on different subsets of the same FEEM cube, e.g. in
jack-knifing or split-half analysis. Since feemparafac
keeps
a reference to the its X
and envir
arguments, the use
of subset
should ensure that the same FEEM cube is referenced
from multiple feemparafac
objects instead of creating copies
of its subsets. Additionally, environment objects are not duplicated
on save
or load
, so storing
X
in an environment and passing it to multiple invocations of
feemparafac
will save a lot of memory when the results are
serialised together.
plot.feemparafac
provides sane defaults for lattice
options such as xlab
, ylab
, as.table
,
auto.key
, type
, cuts
, col.regions
, but
they can be overridden.
Value
feemparafac |
An object of classes
Use |
plot.feemparafac |
A lattice plot object. Its |
coef.feemparafac |
A |
fitted.feemparafac |
A |
resid.feemparafac |
A |
References
Bro R (1997). “PARAFAC. Tutorial and applications.” Chemometrics and Intelligent Laboratory Systems, 38(2), 149-171. doi:10.1016/S0169-7439(97)00032-4.
See Also
The parafac
class structure;
write.openfluor
, feemcube
for methods
specific to values returned from this function.
The rescale
generic is re-exported from the
multiway package.
Examples
data(feems)
cube <- feemscale(feemscatter(cube, c(24, 14)), na.rm = TRUE)
(factors <- feemparafac(cube, nfac = 3, ctol = 1e-4))
plot(factors, 'image')
plot(factors, 'line')
head(coef(factors, 'loadings'))
str(coef(factors, 'all'))
str(feemcube(factors)) # original cube is retained
plot(fitted(factors))
plot(resid(factors))
Synthetic fluorescence excitation-emission matrices and absorbance spectra
Description
This dataset consists of twelve fluorescence and absorbance spectra simulated from three trilinear components, with scattering signal added and divided by a correction factor to simulate inner filter effect.
Usage
data("feems")
Format
- feems
-
A named list of 12
feem
objects containing fluorescence data measured with excitation wavelengths between230
nm and350
nm (with a step of2
nm) and emission wavelengths between240
nm and435
(with a step of5
nm). - cube
-
A 12-sample
feemcube
object consisting of of 32 by 10 FEEMs measured at the same wavelength range as above with inner filter effect corrected. - absorp
-
A 12-element named list containing absorbance spectra measured between
230
and450
nm in1
cm cells. Each element of the list is a two-column matrix. The first column contains the wavelengths and the second column contains the absorbance values.
Examples
data(feems)
plot(cube)
plot(feems$a)
matplot(
absorp[[1]][,1],
sapply(absorp, function(x) x[,2]),
type = 'l', lty = 1
)
Rescale FEEM spectra to a given norm and remember the scale factor
Description
Given a norm function (typically, standard deviation), scale the intensities in FEEM objects to it and optionally remember the scale factor.
Usage
feemscale(x, ...)
## S3 method for class 'feem'
feemscale(x, norm = sd, remember = TRUE, ...)
## S3 method for class 'feemcube'
feemscale(x, ..., progress = FALSE)
## S3 method for class 'list'
feemscale(x, ..., progress = FALSE)
Arguments
x |
A FEEM object, a FEEM cube object, or a list of anything compatible
with |
norm |
A function taking a numeric matrix and returning its norm. Typically,
|
remember |
Whether to remember the scale factor. If |
... |
Passed as-is to |
progress |
Set to |
Value
feemscale.feem
: a FEEM object with intensities divided by scale
factor (norm(x)
) and its scale
attribute multiplied by
the scale factor.
feemscale.feemcube
: a FEEM cube built from FEEM objects scaled
as described above.
feemscale.list
: a list consisting of results of
feemscale
generic applied to its elements.
References
Bro R, Smilde AK (2003). “Centering and scaling in component analysis.” Journal of Chemometrics, 17(1), 16-33. doi:10.1002/cem.773.
See Also
Examples
feemscale(feem(matrix(1:42, 6), 1:6, 1:7))
Handle scattering signal in FEEMs
Description
Remove or interpolate scattering signal in individual FEEM objects, FEEM cube objects, or lists of them.
Usage
feemscatter(x, ...)
## S3 method for class 'list'
feemscatter(x, ..., cl, progress = TRUE)
## S3 method for class 'feemcube'
feemscatter(x, ..., cl, progress = TRUE)
## S3 method for class 'feem'
feemscatter(
x, widths, method = c("omit", "pchip", "loess", "kriging", "whittaker"),
add.zeroes = 30, Raman.shift = 3400, ...
)
Arguments
x |
An individual FEEM object, FEEM cube object, or a list of them, to handle the scattering signal in. |
widths |
A numeric vector or a list containing the half-widths of the scattering bands, in nm. Rayleigh scattering is followed by Raman scattering, followed by second diffraction order for Rayleigh and Raman, and so on. (Typically, there's no need for anything higher than third order, and even that is rare.) For example:
For higher diffraction orders, the peak widths are proportionally
scaled, making it possible to provide the same number for all kinds
of scattering visible in the EEM. Set a width to It's possible to specify the bands asymmetrically. If the area to be
corrected should range from x nm to the left of the scattering
peak to y nm to the right of it, pass a list instead of a
vector, and put a two-element vector To sum up, given two half-widths
In this example, a much larger portion of the anti-Stokes area is
removed near the first order Rayleigh scattering signal than in
the Stokes area. This can be useful to get rid of undesired signal
where no fluorescence is observed on some spectrometers. The
second and third order scattering signal areas are automatically
scaled |
method |
A string choosing how to handle the scattering signal:
|
add.zeroes |
Set intensities at |
Raman.shift |
Raman shift of the scattering signal of water,
|
... |
Passed verbatim from If “pchip” method is selected, the If “loess” method is selected, remaining options are passed
to If “kriging” method is selected, remaining options are passed
to If “whittaker” method is selected, available parameters
include |
cl |
If not |
progress |
Set to |
Details
The “pchip” method works by default as described in
(Bahram, Bro, Stedmon, and Afkhami 2006): each emission spectrum at different excitation
wavelengths is considered one by one. Zeroes are inserted in the
corners of the spectrum if they are undefined (NA
) to prevent
extrapolation from blowing up, then the margins are interpolated using
the corner points, then the rest of the spectrum is interpolated line
by line. Since pchip
requires at least 3 points
to interpolate, the function falls back to linear interpolation if it
has only two defined points to work with. The by
argument
controls whether the function proceeds by rows of the matrix
(“emission”, default), by columns of the matrix
(“excitation”), or does both (“both”) and averages the
results to make the resulting artefacts less severe (Pucher, Wünsch, Weigelhofer, Murphy, Hein, and Graeber 2019)
(see the staRdom package itself).
The “loess” method feeds the whole FEEM except the area to be
interpolated to loess
, then asks it to predict
the remaining part of the spectrum. Any negative values predicted by
loess
are replaced by 0
.
The “kriging” method (Press, Teukolsky, Vetterling, and Flannery 2007) is much more
computationally expensive than the previous two, but, on some spectra,
provides best results, not affected by artefacts resulting from
line-by-line one-dimensional interpolation (pchip
) or varying
degrees of smoothness in different areas of the spectrum
(loess
). Any negative values returned by
kriging
are replaced by 0
.
Whittaker smoothing
The “whittaker” method (Krylov and Labutin 2023) works by minimising a
sum of penalties, requiring the interpolated surface to be close to
the original points around it and to be smooth in terms of derivatives
by
\lambda_\mathrm{em}
and
\lambda_\mathrm{ex}
.
The parameters d
and lambda
should be numeric vectors
of the same length, corresponding to the derivative orders (whole
numbers \ge 1
) and their respective penalty weights (small
real numbers; larger is smoother). For interpolation purposes, the
default penalty is
10^{-2} \mathbf{D}_1 + 10 \mathbf{D}_2
,
which corresponds to d = 1:2
and lambda = c(1e-2, 10)
.
Any resulting negative values are pulled towards 0
by adding
zero-valued points with weight nonneg
(default 1
) and
retrying. Set nonneg
to 0
to disable this behaviour. It
is also possible to deal with resulting negative values by scaling and
shifting the signal between logscale
(typically) and 1
,
interpolating the logarithm of the signal, then undoing the
transformation. By default logscale
is NA
, disabling
this behaviour, since it may negatively affect the shape of
interpolated signal.
See the internal help page whittaker2 for implementation details.
Value
An object of the same kind (FEEM object / FEEM cube / list of them) with scattering signal handled as requested.
References
Bahram M, Bro R, Stedmon C, Afkhami A (2006). “Handling of Rayleigh and Raman scatter for PARAFAC modeling of fluorescence data using interpolation.” Journal of Chemometrics, 20(3-4), 99-105. doi:10.1002/cem.978.
Krylov IN, Labutin TA (2023). “Recovering fluorescence spectra hidden by scattering signal: in search of the best smoother.” Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 122441. doi:10.1016/j.saa.2023.122441.
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007). “Interpolation by Kriging.” In Numerical recipes: The Art of Scientific Computing (3rd Ed.), chapter 3.7.4, 144-147. Cambridge University Press, New York.
Pucher M, Wünsch U, Weigelhofer G, Murphy K, Hein T, Graeber D (2019). “staRdom: Versatile Software for Analyzing Spectroscopic Data of Dissolved Organic Matter in R.” Water, 11(11), 2366. doi:10.3390/w11112366.
Thygesen LG, Rinnan Å, Barsberg S, Møller JKS (2004). “Stabilizing the PARAFAC decomposition of fluorescence spectra by insertion of zeros outside the data area.” Chemometrics and Intelligent Laboratory Systems, 71(2), 97-106. ISSN 0169-7439, doi:10.1016/j.chemolab.2003.12.012.
See Also
Examples
data(feems)
plot(x <- feemscatter(
feems[[1]], widths = c(25, 25, 20, 20),
method = 'whittaker', Raman.shift = 3500
))
Split-half analysis of PARAFAC models
Description
This function validates PARAFAC with different numbers of components by means of splitting the data cube in halves, fitting PARAFAC to them and comparing the results (DeSarbo 1984).
Usage
feemsplithalf(
cube, nfac, splits, random, groups, fixed, ..., progress = TRUE
)
## S3 method for class 'feemsplithalf'
plot(
x, kind = c('tcc', 'factors', 'aggtcc', 'bandfactors'), ...
)
## S3 method for class 'feemsplithalf'
print(x, ...)
## S3 method for class 'feemsplithalf'
coef(
object, kind = c('tcc', 'factors', 'aggtcc', 'bandfactors'), ...
)
Arguments
cube |
A |
nfac |
An integer vector of numbers of factors to check. |
splits |
A scalar or a two-element vector consisting of whole numbers. The first element is the number of parts to split the data cube into, which must be even. After splitting, the parts are recombined into non-intersecting halves (Murphy, Stedmon, Graeber, and Bro 2013), which are subjected to PARAFAC decomposition and compared against each other. The second element, if specified, limits the total number of comparisons between the pairs, since the number of potential ways to recombine the parts of the data cube into halves grows very quickly. The number of PARAFAC models fitted is
Mutually incompatible with the parameters |
random |
Number of times to shuffle the dataset, split into non-intersecting halves, fit a PARAFAC model to each of the halves and compare halves against each other (Krylov, Drozdova, and Labutin 2020). The number of PARAFAC models fitted is
Mutually incompatible with the parameters |
groups |
Use this argument to preserve the ratios between the groups present
in the original dataset when constructing the halves. If specified,
must be a factor or an integer vector of length For the split-combine method ( Mutually incompatible with the |
fixed |
Use this argument to manually specify the contents of the halves to test. The argument must be a list containing two-element lists specifying the halves to compare. Each half must be a vector consisting of whole numbers specifying sample indices in the cube (see the example). It is considered an error to specify a sample in both halves. Mutually incompatible with the parameters |
progress |
Set to FALSE to disable the progress bar. |
x , object |
An object returned by |
kind |
Chooses what type of data to return or plot:
|
... |
|
Details
As the models (loadings \mathbf A
,
\mathbf B
and scores \mathbf C
)
are fitted, they are compared to the first model of the same number
of factors (Tucker's congruence coefficient is calculated using
congru
for emission and excitation mode
factors, then the smallest value of the two is chosen for the purposes
of matching). The models are first reordered according to the best
match by TCC value, then rescaled (Riu and Bro 2003) by minimising
|| \mathbf A \, \mathrm{diag}(\mathbf s_\mathrm A) -
\mathbf A^\mathrm{orig} ||^2
and
|| \mathbf{B} \, \mathrm{diag}(\mathbf s_\mathrm B) -
\mathbf B^\mathrm{orig} ||^2
over \mathbf s_\mathrm A
and
\mathbf s_\mathrm B
, subject to
\mathrm{diag}(\mathbf s_\mathrm A) \times
\mathrm{diag}(\mathbf s_\mathrm B) \times
\mathrm{diag}(\mathbf s_\mathrm C) = \mathbf I
, to make them comparable.
To perform stratified sampling on a real-valued variable (e.g. salinity,
depth), consider binning samples into groups using
cut
, perhaps after histogram flattening using
ecdf(x)(x)
. To determine the number of breaks, consider
nclass.Sturges
.
To conserve memory, feemsplithalf
puts the user-provided
cube
in an environment and passes it via envir
and
subset
options of feemparafac
. This means that,
unlike in feemparafac
, the cube
argument has
to be a feemcube
object and passing envir
and
subset
options to feemsplithalf
is not supported.
Instead of forwarding the arguments parallel
, cl
to
multiway::parafac
, feemsplithalf
schedules the calls to feemparafac
on the cluster by
itself. This makes it possible to fit more than nstart
models
at the same time if enough nodes are present in the parallel
cluster cl
.
plot.feemsplithalf
plots results of the split-half procedure
(TCC or loading values depending on the kind
argument)
using lattice graphics. Sane defaults are provided for
xyplot
parameters xlab
, ylab
,
as.table
, but they can be overridden.
print.feemsplithalf
displays a very short summary of the
analysis, currently the minimum TCC value for each number of components.
coef.feemsplithalf
returns the Tucker's congruence
coefficients resulting from the split-half analysis.
Value
- feemsplithalf, print.feemsplithalf
-
An object of class
feemsplithalf
, containing named fields:- factors
-
A
list
offeemparafac
objects containing the factors of the halves. The list has dimensions, the first one corresponding to the halves (always 2), the second to different numbers of factors (as many as innfac
) and the third to different groupings of the samples (depends onsplits
orrandom
). - tcc
-
A named list containing arrays of Tucker's congruence coefficients between the halves. Each entry in the list corresponds to an element in the
nfac
argument. The dimensions of each array in the list correspond to, in order: the factors (1 tonfac[i]
), the modes (emission or excitation) and the groupings of the samples (depending onsplits
orrandom
). - nfac
-
A copy of
nfac
argument.
- plot.feemsplithalf
-
A lattice plot object. Its
print
orplot
method will draw the plot on an appropriate plotting device. - coef.feemsplithalf
-
A
data.frame
containing various columns, depending on the value of thekind
argument:- tcc
-
- factor
-
The factor (out of
nfac
) under consideration. - tcc
-
Tucker's congruence coefficient between a pair of matching components. Out of two possible values (TCC between excitation loadings or emission loadings), the minimal one is chosen, because the same rule is used to find which components match when reordering them in a pair of models.
- test
-
The sequence number for each pair of models in the split-half test, related to the third dimension of
object$factors
orobject$tcc
. May be used to group values for plotting or aggregation. - subset
-
Consists of two-element lists containing indices of the samples in each half of the original cube.
- nfac
-
The number of factors in the pair of models under consideration.
- factors
-
- wavelength
-
Emission and excitation wavelengths.
- value
-
The values of the loadings.
- factor
-
Number of the factor,
1
tonfac
. - mode
-
The mode the loading value belongs to, “Emission” or “Excitation”.
- nfac
-
Total number of factors.
- test
-
Sequence number of a split-half test, indicating a given way to split the dataset in a group of splits with the same numbers of factors.
- half
-
Number of the half,
1
or2
. - subset
-
For every row, this is an integer vector indicating the subset of the original data cube that the loadings have been obtained from.
- aggtcc
-
The columns
tcc
,nfac
,test
after aggregation ofcoef(kind = 'tcc')
. - bandfactors
-
Columns
wavelength
,factor
,mode
,nfac
fromcoef(kind = 'factors')
, plus columnslower
,estimate
,upper
signifying the outputs from the aggregation function.
References
DeSarbo WS (1984). “An Application of PARAFAC to a Small Sample Problem, Demonstrating Preprocessing, Orthogonality Constraints, and Split-Half Diagnostic Techniques (Appendix).” Research Methods for Multimode Data Analysis, 602-642. https://papers.ssrn.com/abstract=2783446.
Krylov I, Drozdova A, Labutin T (2020). “Albatross R package to study PARAFAC components of DOM fluorescence from mixing zones of arctic shelf seas.” Chemometrics and Intelligent Laboratory Systems, 207(104176). doi:10.1016/j.chemolab.2020.104176.
Murphy KR, Stedmon CA, Graeber D, Bro R (2013). “Fluorescence spectroscopy and multi-way techniques. PARAFAC.” Analytical Methods, 5, 6557-6566. doi:10.1039/c3ay41160e.
Riu J, Bro R (2003). “Jack-knife technique for outlier detection and estimation of standard errors in PARAFAC models.” Chemometrics and Intelligent Laboratory Systems, 65(1), 35-49. doi:10.1016/S0169-7439(02)00090-4.
See Also
feemparafac
, parafac
,
congru
, feemcube
.
Examples
data(feems)
cube <- feemscale(feemscatter(cube, rep(14, 4)), na.rm = TRUE)
(sh <- feemsplithalf(
cube, 1:4, splits = 4, # => S4C6T3
# splits = c(4, 2) would be S4C4T2, and so on
# the rest is passed to multiway::parafac;
ctol = 1e-4
# here we set a mild stopping criterion for speed;
# be sure to use a stricter one for real tasks
))
# specifying fixed halves to compare as list of 2-element lists
fixed <- list(
list(1:6, 7:12),
list(seq(1, 11, 2), seq(2, 12, 2))
)
sh.f <- feemsplithalf(cube, 2:3, fixed = fixed, ctol = 1e-4)
plot(sh, 'aggtcc')
head(coef(sh, 'factors'))
Perceptually uniform palettes
Description
Create perceptually continuous palettes of R colours.
Usage
marine.colours(
n, chroma = 0.65, luminance = c(0.35, 1),
alpha = 1, gamma = 1, fixup = TRUE
)
diverging.colours(
n, chroma = c(.1, .75), luminance = c(1, .35),
alpha = 1, gamma = 1, fixup = TRUE
)
Arguments
n |
Number of colours to return. |
chroma |
Specifies the chroma (how saturated should the colours be) for the
palette, a real number between |
luminance |
Specifies the luminance (how bright should the colours be) of the
colours constituting the palette. Typically, a two-element vector
of real numbers between |
alpha |
Specifies the transparency of the colours of the palette. As above,
can be a fixed number or a two-element vector in the range
|
gamma |
Provides the power coefficient for the hue/chroma/luminance/alpha
growth formulae. May be useful when it is needed to sacrifice the
perceptual linearity of the palette to provide more contrast for
smaller or bigger values on the plot. The gamma-corrected values
are obtained by computing
|
fixup |
Whether to correct the palette if the resulting colours happen to
fall outside the valid RGB range (passed as-is to |
Details
The marine.colours
palette is used by default by all
plot
methods (e.g. plot.feem
) for FEEM-like data
to show absolute values. It is designed to retain perceptual
uniformity even after complete desaturation.
The diverging.colours
palette is used by
plot.feem.resid
to display residual values. People with
severe colour vision deficiency (tritanopia or monochromacy) won't be
able to discern positive and negative branches of the palette, but
it's supposed to be legible for people with deuteranopia and
protanopia.
Value
A character vector of length n
containing colour specifications
for use with R graphics functions.
The marine.colours
palette at the default values of
C_{uv}^* = 0.65
, L^* \in [0.35; 1]
, \alpha = \gamma = 1
.
The diverging.colours
palette at the default values of
C_{uv}^* \in [0.1; 0.75]
, L^* \in [0.35;
1]
,
\alpha = \gamma = 1
.
References
Insired by cmocean palette called “haline” (https://matplotlib.org/cmocean/#haline), but using R's implementation of polar CIE-LUV colour space instead of CAM02-UCS.
CUBEHELIX (https://people.phy.cam.ac.uk/dag9/CUBEHELIX/) is a similar technique using BT.601 luminance coefficients and RGB colour space.
See Also
Examples
image(volcano, col = marine.colours(256))
Plot a FEEM object
Description
Plot a 2D fluorescence intensity surface as a pseudo-colour image.
Usage
## S3 method for class 'feem'
plot(
x, xlab, ylab, cuts = 128,
col.regions = marine.colours(256), ...
)
## S3 method for class 'feemcube'
plot(
x, xlab, ylab, cuts = 128,
col.regions = marine.colours(256), as.table = TRUE, ...
)
## S3 method for class 'feem.resid'
plot(
x, ..., at, col.regions = diverging.colours(256)
)
Arguments
x |
An FEEM object. |
xlab |
The x-axis label for the plot, with a sane default. |
ylab |
The y-axis label for the plot, with a sane default. |
cuts |
The number of distinct levels the intensity would be divided into, areas between them assigned different colours. |
col.regions |
The palette to take the colours from, a character vector of R colour specifications. |
at |
The breakpoints in the intensity values. For residual plots,
automatically set in a symmetric manner, ignoring the |
as.table |
Whether to draw the panels left to right, top to bottom. (Otherwise they are drawn left to right, bottom to top.) |
... |
Passed as-is to |
Value
A lattice plot object. Its print
or plot
method
will draw the plot on an appropriate plotting device.
See Also
Examples
plot(feem(matrix(1:42/42, nrow = 7), 320 + 1:7, 300 + 1:6))
Implementation notes for Whittaker smoothing and interpolation of surfaces
Description
Smooth (Eilers 2003) or estimate the baseline (Eilers 2004) of a surface measured on an arbitrary grid by minimising a sum of penalties. Combined difference orders and two different methods of preventing negative values in the output are supported.
This is not a public interface. Subject to change without further notice. Please do not call from outside albatross.
Usage
whittaker2(x, y, z, lambda, d, p, logscale, nonneg)
diffmat(x, y, d)
vandermonde(x0)
Arguments
x |
Grid values along the rows of |
y |
Grid values along the columns of |
z |
Matrix containing the surface values to smooth or |
lambda |
A vector of smoothness penalties, one for every difference order.
Must be of the same length as |
d |
|
p |
If not missing, use the asymmetric penalty method
(Eilers 2004) to estimate the baseline by penalising the
differences with weight |
logscale |
If not Such transformation prevents the resulting values from getting lower
than A typical value would be |
nonneg |
If not |
x0 |
A vector specifying the grid where the function to be differentiated is measured. Must be sorted. |
Details
Finite difference approximation
How to differentiate a function tabulated on a fixed, potentially nonuniform grid before you even know its values? Use its Taylor series.
First derivative is special because it's possible to use central
differences and get a second-order accurate result, even on a
non-uniform grid, by carefully choosing the points where the
derivative is calculated. Let x + \frac{h}{2}
and
x - \frac{h}{2}
be a pair of adjacent points from the
grid. Here's the Taylor series expansion for f
around x
,
with the Lagrange form of the reminder:
f \left(x + \frac{h}{2}\right) =
f(x) + \frac{h}{2} f'(x) + \frac{h^2}{8} f''(x) +
\frac{h^3}{48} f'''(\zeta)
f \left(x - \frac{h}{2}\right) =
f(x) - \frac{h}{2} f'(x) + \frac{h^2}{8} f''(x) -
\frac{h^3}{48} f'''(\eta)
f \left(x + \frac{h}{2}\right) -
f \left(x - \frac{h}{2}\right) = h f'(x) +
\frac{h^3}{48} \left(f'''(\zeta) + f'''(\eta)\right)
f'(x) = \frac{
f\left(x + \frac{h}{2}\right) - f\left(x - \frac{h}{2}\right)
}{h} - \frac{h^2}{48} \left(f'''(\zeta) + f'''(\eta)\right)
|\delta f'(x)| \le \max_{
\xi \in [x - \frac{h}{2}, x + \frac{h}{2}]
} \frac{h^2}{24} f'''(\xi)
Suppose the three grid points
\xi_1 = x_1 - \frac{h_1}{2}
,
\xi_2 = x_1 + \frac{h_1}{2} = x_2 - \frac{h_2}{2}
,
\xi_3 = x_2 + \frac{h_2}{2}
are adjacent on the grid, and we
know the f'
estimates in x_1
and
x_2
:
On the one hand, Taylor series
expansion for f'(x)
around
\frac{x_1 + x_2}{2}
gives:
f'(x_2) = f'\left(\frac{x_1 + x_2}{2}\right)
+ f''\left(\frac{x_1 + x_2}{2}\right)\frac{x_2 - x_1}{2}
+ f'''\left(\frac{x_1 + x_2}{2}\right)\frac{(x_2 - x_1)^2}{8}
+ f''''(\zeta)\frac{(x_2 - x_1)^3}{48}
f'(x_1) = f'\left(\frac{x_1 + x_2}{2}\right)
- f''\left(\frac{x_1 + x_2}{2}\right)\frac{x_2 - x_1}{2}
+ f'''\left(\frac{x_1 + x_2}{2}\right)\frac{(x_2 - x_1)^2}{8}
- f''''(\eta)\frac{(x_2 - x_1)^3}{48}
f''\left(\frac{x_1 + x_2}{2}\right) =
\frac{f'(x_2) - f'(x_1)}{x_2 - x_1} -
\frac{(x_2 - x_1)^2}{48}(f''''(\zeta) + f''''(\eta))
|\delta f''(x)| \le \max_{
\xi \in [x_1, x_2]
} \frac{(x_2 - x_1)^2}{24} f''''(\xi)
On the other hand, if we substitute the estimations of f'(x)
from above, we get:
f''\left(\frac{x_1 + x_2}{2}\right) = \frac{
h_1 f(\xi_3) - h_1 f(\xi_2)
- h_2 f(\xi_2) + h_2 f(\xi_1)
}{h_1 h_2 (x_2 - x_1)} - \frac{
h_2^2 f'''(\zeta_2)
- h_1^2 f'''(\zeta_1)
}{24(x_2 - x_1)}
- \frac{(x_2 - x_1)^2}{24} f''''(\eta)
This is why we can't just keep using central differences and get
n+1
th order accurate results.
What are the general methods of finding the coefficients for the
\mathbf D
matrix? Start with a system of Taylor
series expansions for every grid point:
f(x_i) = \sum_{k=0}^{n-1} f^{(k)}(x) \frac{(x_i - x)^k}{k!}
+ f^{(n)}(\xi) \frac{(x_i - x)^{n}}{n!}
\; \forall i = 1 \dots n
We can solve this system for coefficients
c_i
giving the desired l
-th
derivative estimate with highest accuracy p
possible
(LeVeque 2007):
\sum_{i = 1}^n c_i f(x_i) = f^{(l)}(x) + o(h^p)
Substituting the approximations for
f(x_i)
into the equation, we get the
following condition for the multiplier in front of each
f^{(k)}(x)
:
\frac{1}{k!} \sum_{i = 1}^n c_i (x_i - x)^k = \mathbf{1}_{k = l}
\; \forall k = 0 \dots n-1
In the matrix form, this becomes a Vandermonde system:
V_{k,i} = \frac{(x_i - x)^k}{k!}
b_k = \mathbf{1}_{k = l}
\mathbf c = \mathbf{V}^{-1} \mathbf b
Unfortunately, this system becomes ill-conditioned for “large”
numbers of points. (Experiment shows noticeably growing
c_i
even for third derivative from 10
points and no solution for 32
points on a uniform grid.)
Fornberg (Fornberg 1988) suggests a more numerically stable
procedure, but it still breaks down for 1000
points.
It is important to note that the performance of the method depends
on the matrix \mathbf D
being sparse. While the methods
described above could give more accurate results, they do so at the cost of
providing nonzero weights for a lot of points, and the weights get larger
as the number of points increases. Therefore, with the knowledge that
difference orders above 3
are used very rarely and the interest in
simplicity and performance, we'll minimise the number of coefficients and
their values by solving the Vandermonde system for the minimally accurate
derivative estimations, taking exactly k + 1
points for k
-th
derivative.
What is the error of such estimation? Substituting the Lagrange form
of the remainder into
\mathbf{c}^\top f(\mathbf x)
, we get:
\sum_{i = 1}^n c_i f(x_i) = f^{(n-1)}(x) +
\sum_{i = 1}^n c_i f^{(n)}(\xi_i) \frac{(x_i - x)^n}{n!},
\; \xi_i \in [ x_i, x ]
Our choice of x
(middle point for odd n
, average of
middle points for even n
) lets us shave off one term from the
sum above for odd n
and get second order accurate results for
n = 2
, but other than that, the method is n
-th order
accurate.
Whittaker smoothing
Whittaker smoothing works by minimising a sum of penalties
(Eilers 2003). Interpolation can be achieved by setting weights
\mathbf w
to 0
for the missing points.
\min_{\mathbf{\hat z}} \:
(\mathbf{\hat z} - \mathbf{z})^\top
\mathrm{diag}(\mathbf w)
(\mathbf{\hat z} - \mathbf{z})
+ \lambda | \mathbf D \mathbf{\hat z} |^2
By writing down the derivatives over
\mathbf{\hat z}
and equating them to
0
, we get the normal equation:
(
\mathrm{diag}(\mathbf w) +
\lambda \mathbf{D}^\top \mathbf{D}
) \mathbf{\hat z} = \mathrm{diag}(\mathbf w) \mathbf z
The equation is then solved using solve
.
Given a one-dimensional penalty matrix
\mathbf{D}_d
of order d
obtained by solving a Vandermonde system for every successive group
of d+1
points, we can adapt it for every applicable group of
points from a fluorescence excitation-emission matrix unfolded into
a vector \mathbf z = \mathrm{vec}\, \mathbf F
by
taking Kronecker products with unit matrices:
\mathbf D = \begin{pmatrix}
\mathbf I \otimes \mathbf{D}_\mathrm{em} \\
\mathbf{D}_\mathrm{ex} \otimes \mathbf I
\end{pmatrix}
Penalties of different orders are concatenated by rows in a similar
manner (which is equivalent to adding the corresponding
\mathbf{D}^\top\mathbf D
matrices together in the normal equation).
It has been shown in (Eilers and Goeman 2004) that a combination of
first- and second-order penalty (
2 \lambda \mathbf{D}_1 + \lambda^2 \mathbf{D}_2
)
results in a non-negative impulse response, but the resulting peak
shape may be sub-optimal.
Value
whittaker2 |
A matrix of the same shape as |
diffmat |
A difference matrix |
vandermonde |
A vector |
References
Eilers PHC (2003). “A Perfect Smoother.” Analytical Chemistry, 75(14), 3631-3636. doi:10.1021/ac034173t.
Eilers PHC (2004). “Parametric Time Warping.” Analytical Chemistry, 76(2), 404-411. doi:10.1021/ac034800e.
Eilers PHC, Goeman JJ (2004). “Enhancing scatterplots with smoothed densities.” Bioinformatics, 20(5), 623-628. doi:10.1093/bioinformatics/btg454.
Fornberg B (1988). “Generation of finite difference formulas on arbitrarily spaced grids.” Mathematics of Computation, 51(184), 699-706. doi:10.1090/S0025-5718-1988-0935077-0.
LeVeque RJ (2007). “Finite Difference Approximations.” In Finite Difference Methods for Ordinary and Partial Differential Equations, chapter 1, 3-11. Society for Industrial and Applied Mathematics (SIAM). https://faculty.washington.edu/rjl/fdmbook/.
See Also
Examples
data(feems)
z <- feemscatter(feems$a, rep(25, 4), 'omit')
str(albatross:::whittaker2(
attr(z, 'emission'), attr(z, 'excitation'), z,
c(1, 1e-3), 1:2, logscale = NA, nonneg = 1
))
Export a PARAFAC model for the OpenFluor database
Description
Prepares a fitted PARAFAC model for submission to OpenFluor - an online spectral database of fluorescence by environmental organic compounds.
Usage
write.openfluor(
model, filename, name = "?", creator = "?", doi = "?",
reference = "?", unit = "?", toolbox =, date =, fluorometer = "?",
constraints =, validation = "?", methods = "?", preprocess = "?",
sources = "?", ecozones = "?", description = "",
shift = FALSE, scale = TRUE
)
Arguments
model |
A |
filename |
Path to the text file to create from the |
name |
Short name of the model. |
creator |
Name of the creator of the model. |
doi |
Digital object identifier of the referenced source. Can also be “ISBN:...” for books. |
reference |
Full citation for the referenced source using the following style: “Author AA, Author BB, Author CC, (year), ‘Title’, Journal Abbrev, Vol, pages”. |
unit |
Units the fluorescence was measured in. Typically, one of “RU”, “QSE”, “AU”. |
toolbox |
Defaults to “albatross version, multiway version”. |
date |
Defaults to today, in “yyyy-mm-dd” format. |
fluorometer |
The model of the instrument that produced the data. |
constraints |
Constraints applied to the PARAFAC model. Defaults to
|
validation |
Validation method used for the PARAFAC model, examples include: “Split-Half Analysis”, “core-consistency”. |
methods |
The sequence of steps taken to handle the samples and to ensure proper fluorescence intensity measurement. Examples include:
|
preprocess |
PARAFAC-specific pre-processing steps applied to the dataset. Examples include (but are not limited to):
|
sources |
Should preferably include one or more of the following keywords:
|
ecozones |
List all major or minor terrestrial, freshwater and marine ecozones and ecoregions that apply. The full set of possible options is too large to include here, but see https://en.wikipedia.org/wiki/Lists_of_ecoregions for a source of inspiration. |
description |
Brief description of the model and its source data in |
shift , scale |
If If Note that OpenFluor clamps values outside the |
Details
Provided the model
and the filename
arguments, this
function exports the loadings into a file that passes OpenFluor syntax
check and is suitable for further editing. Alternatively, some or all
of the fields may be specified programmatically.
The fields constraints
, methods
, preprocess
,
sources
, ecozones
can be specified as character vectors
(to be comma-separated on output); others should be single strings.
References
Murphy KR, Stedmon CA, Wenig P, Bro R (2014). “OpenFluor - an online spectral library of auto-fluorescence by organic compounds in the environment.” Analytical Methods, 6, 658-661. doi:10.1039/C3AY41935E.
https://openfluor.lablicate.com/
See Also
Examples
data(feems)
cube <- feemscale(feemscatter(cube, c(24, 14)), na.rm = TRUE)
factors <- feemparafac(cube, nfac = 3, ctol = 1e-4)
# all defaults
write.openfluor(factors, f1 <- tempfile(fileext = '.txt'))
if (interactive()) file.show(f1)
unlink(f1)
# all non-default arguments
write.openfluor(
factors, f2 <- tempfile(fileext = '.txt'), name = 'example',
creator = 'J. Doe', doi = '10.1000/1', reference = paste(
'Upper D, (1973),', "'The unsuccessful self-treatment of a case",
"of \"writer's block\"',", 'J Appl Behav Anal, 7(3), 497'
), unit = 'AU', toolbox = 'all calculations done by hand',
date = '2038-01-19', fluorometer = 'Acme Fluor-o-matic 9000',
constraints = 'non-negative', validation = 'prior knowledge',
methods = 'Instrument spectral bias correction: Ex & Em',
preprocess = 'Scatter region excised (replaced with NaNs)',
sources = 'freshwater', ecozones = 'Balkash',
description = 'not a real model', shift = FALSE, scale = TRUE
)
if (interactive()) file.show(f2)
unlink(f2)