Title: | 'TRFLP' Analysis and Matching Package for R |
Version: | 1.0-10 |
License: | GPL-2 |
Depends: | R (≥ 2.4) |
URL: | https://github.com/richfitz/TRAMPR |
Description: | Matching terminal restriction fragment length polymorphism ('TRFLP') profiles between unknown samples and a database of known samples. 'TRAMPR' facilitates analysis of many unknown profiles at once, and provides tools for working directly with electrophoresis output through to generating summaries suitable for community analyses with R's rich set of statistical functions. 'TRAMPR' also resolves the issues of multiple 'TRFLP' profiles within a species, and shared 'TRFLP' profiles across species. |
NeedsCompilation: | no |
Packaged: | 2022-02-07 18:07:37 UTC; rich |
Author: | Rich FitzJohn [aut, cre], Ian Dickie [aut] |
Maintainer: | Rich FitzJohn <rich.fitzjohn@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2022-02-07 18:40:02 UTC |
The TRAMPR Package (TRFLP Analysis and Matching Package for R)
Description
This package contains a collection of functions to help analyse terminal restriction fragment length polymorphism (TRFLP) profiles, by matching unknown peaks to known TRFLP profiles in order to identify species.
The TRAMPR
package contains a vignette, which includes a worked
example; type vignette("TRAMPRdemo")
to view it. To see all
documented help topics, type library(help=TRAMPR)
.
Details
Start by reading the TRAMP
(and perhaps
create.diffsmatrix
) help pages, which explain the
matching algorithm.
Then read load.abi
to learn how to load ABI format data
into the program. Alternatively, read TRAMPsamples
and
read.TRAMPsamples
to load already-processed data.
If you already have a collection of knowns, read
TRAMPknowns
and read.TRAMPknowns
to learn
how to load them. Otherwise, read build.knowns
to learn
how to automatically generate a set of known profiles from your data.
Once your data are loaded, reread TRAMP
to do the
analysis, then read plot.TRAMP
and
summary.TRAMP
to examine the analysis.
update.TRAMP
may also be useful for modifying your
matches. summary.TRAMP
is also useful for preparing
presence/absence matrices for analysis with other tools (e.g. the
vegan package; see the vignette indicated below).
TRAMPR works with database-like objects, and a basic understanding of relational databases and primary/foreign keys will aid in understanding some aspects of the package.
Citation
Please see citation("TRAMPR")
for the citation of
TRAMPR
.
Note
TRAMPR is designed specifically for “database TRFLP” (identifying species based on a database of known TRFLP profiles: see Dicke et al. 2002. It is not designed for direct community analysis of TRFLP profiles as in peak-profile TRFLP.
Author(s)
Rich FitzJohn and Ian Dickie, Landcare Research
References
Dicke IA, FitzJohn RG 2007: Using terminal-restriction fragment length polymorphism (T-RFLP) to identify mycorrhizal fungi; a methods review. Mycorrhiza 17: 259-270.
Dickie IA, Xu B, Koide RT 2002. Vertical distribution of ectomycorrhizal hyphae in soil as shown by T-RFLP analysis. New Phytologist 156: 527-535.
FitzJohn RG, Dickie IA 2007: TRAMPR: An R package for analysis and matching of terminal-restriction fragment length polymorphism (TRFLP) profiles. Molecular Ecology Notes [doi:10.1111/j.1471-8286.2007.01744.x].
TRFLP Analysis and Matching Program
Description
Determine if TRFLP profiles may match those in a database of knowns. The resulting object can be used to produce a presence/absence matrix of known profiles in environmental samples.
The TRAMPR
package contains a vignette, which includes a worked
example; type vignette("TRAMPRdemo")
to view it.
Usage
TRAMP(samples, knowns, accept.error=1.5, min.comb=4, method="maximum")
Arguments
samples |
A |
knowns |
A |
accept.error |
The largest acceptable difference (in base pairs)
between any peak in the sample data and the knowns database (see
Details; interpretation will depend on the value of |
min.comb |
Minimum number of enzyme/primer combinations required
before presence will be tested. The default (4) should be
reasonable in most cases. Setting |
method |
Method used in calculating the difference between
samples and knowns; may be one of |
Details
TRAMP
attempts to determine which species in the
‘knowns’ database may be present in a collection of
samples.
A sample matches a known if it has a peak that is “close
enough” to every peak in the known for every enzyme/primer
combination that they share. The default is to accept matches where
the largest distance between a peak in the knowns database and the
sample is less than accept.error
base pairs (default 2), and
where at least min.comb
enzyme/primer combinations are shared
between a sample and a known (default 4).
The three-dimensional matrix of match errors is generated by
create.diffsmatrix
. In the resulting array,
m[i,j,k]
is the difference (in base pairs) between the
i
th sample and the j
th known for the k
th
enzyme/primer combination.
If p_k
and q_k
are the sizes of peaks for the k
th
enzyme/primer combination for a sample and known (respectively), then
maximum distance is defined as
\max(|p_k - q_k|)
Euclidian distance is defined as
\frac{1}{n}\sqrt{\sum (p_k - q_k)^2}
and Manhattan distance is defined as
\frac{1}{n}\sum{|p_k - q_k|}
where n
is the number of shared enzyme/primer combinations,
since this may vary across sample/known combinations. For Euclidian
and Manhattan distances, accept.error
then becomes the
mean distance, rather than the total distance.
Value
A TRAMP
object, with elements:
presence |
Presence/absence matrix. Rows are different samples
(with rownames from |
error |
Matrix of distances between the samples and known,
calculated by one of the methods described above. Rows correspond
to different samples, and columns correspond to different knowns.
The matrix dimension names are set to the values |
n |
A two-dimensional matrix (same dimensions as |
diffsmatrix |
Three-dimensional array of output from
|
enzyme.primer |
Different enzyme/primer combinations present in
the data, in the order of the third dimension of |
samples , knowns , accept.error , min.comb , method |
The input data objects and arguments, unmodified. |
In addition, an element presence.ign
is included to allow
matches to be ignored. However, this interface is experimental and
its current format should not be relied on - use
remove.TRAMP.match
rather than interacting directly with
presence.ign
.
Matching is based only on peak size (in base pairs), and does not consider peak heights.
See Also
See create.diffsmatrix
for discussion of how differences
between sample and known profiles are generated.
plot.TRAMP
, which displays TRAMP fits graphically.
summary.TRAMP
, which creates a presence/absence matrix.
remove.TRAMP.match
, which marks TRAMP matches as
ignored.
Examples
data(demo.knowns)
data(demo.samples)
res <- TRAMP(demo.samples, demo.knowns)
## The resulting object can be interrogated with methods:
## The goodness of fit of the sample with sample.pk=101 (see
## ?\link{plot.TRAMP}).
plot(res, 101)
## Not run:
## To see all plots (this produces many figures), one after another.
op <- par(ask=TRUE)
plot(res)
par(op)
## End(Not run)
## Produce a presence/absence matrix (see ?\link{summary.TRAMP}).
m <- summary(res)
head(m)
Index (Subset) TRAMPsamples and TRAMPknowns Objects
Description
This provides very basic support for subsetting
TRAMPsamples
and TRAMPknowns
objects.
Usage
## S3 method for class 'TRAMPknowns'
x[i, na.interp=TRUE, ...]
## S3 method for class 'TRAMPsamples'
x[i, na.interp=TRUE, ...]
Arguments
x |
A |
i |
A vector of |
na.interp |
Logical: Controls how |
... |
Further arguments passed to or from other methods. |
Details
When indexing by logical vectors, NA
values do not make valid
indexes, but may be produced when testing columns that contain missing
values, so these must be converted to either TRUE
or
FALSE
. If i
is a logical index that contains missing
values (NA
s), then na.interp
controls how they will be
interpreted:
If
na.interp=TRUE
, thenTRUE, FALSE, NA
becomesTRUE, FALSE, TRUE
.If
na.interp=FALSE
, thenTRUE, FALSE, NA
becomesTRUE, FALSE, FALSE
.
Warning
For TRAMPknowns
objects, if the file.pat
element is
specified as part of the object (see TRAMPknowns
), then
the subsetted TRAMPknowns
object will be written to a file.
This may not be what you want, so it is probably best to disable
knowns writing by doing x$file.pat <- NULL
before doing any
subsetting (where x
is the name of your TRAMPknowns
object).
Examples
data(demo.samples)
data(demo.knowns)
## Subsetting by sample.fk values
labels(demo.samples)
demo.samples[c(101, 102, 110)]
labels(demo.samples[c(101, 102, 110)])
## Take just samples from the first 10 soilcores:
demo.samples[demo.samples$info$soilcore.fk <= 10]
## Indexing also works on TRAMPknowns:
demo.knowns[733]
labels(demo.knowns[733])
TRAMPknowns Objects
Description
These functions create and interact with
TRAMPknowns
objects (collections of known TRFLP
patterns). Knowns contrast with “samples” (see
TRAMPsamples
) in that knowns contain identified
profiles, while samples contain unidentified profiles. Knows must
have at most one peak per enzyme/primer combination (see Details).
Usage
TRAMPknowns(data, info, cluster.pars=list(), file.pat=NULL,
warn.factors=TRUE, ...)
## S3 method for class 'TRAMPknowns'
labels(object, ...)
## S3 method for class 'TRAMPknowns'
summary(object, include.info=FALSE, ...)
Arguments
data |
data.frame containing peak information. |
info |
data.frame, describing individual samples (see Details for definitions of both data.frames). |
cluster.pars |
Parameters used when clustering the knowns database. See Details. |
file.pat |
Optional partial filename in which to store knowns
database after modification. Files |
warn.factors |
Logical: Should a warning be given if any columns
in |
object |
A |
include.info |
Logical: Should the output be augmented with the
contents of the |
... |
|
Details
The object has at least two components, which relate to each other (in
the sense of a relational database). info
holds information
about the individual samples, and data
holds information about
individual peaks (many of which may belong to a single sample).
Column definitions:
-
info
:knowns.pk
:Unique positive integer, used to identify individual knowns (i.e. a “primary key”).
species
:Character, giving species name.
-
data
:knowns.fk
:Positive integer, indicating which sample the peak belongs to (by matching against
info$knowns.pk
) (i.e. a “foreign key”).primer
:Character, giving the name of the primer used.
enzyme
:Character, giving the name of the restriction digest enzyme used.
size
:Numeric, giving size (in base pairs) of the peak.
In addition, TRAMPknowns
will create additional columns holding
clustering information (see group.knowns
). Additional
columns are allowed (and retained, but ignored) in both data.frames.
Additional objects are allowed as part of the TRAMPknowns
object, but these will not be written by
write.TRAMPknowns
; any extra objects passed (via
...
) will be included in the final TRAMPknowns
object.
The cluster.pars
argument controls how knowns will be clustered
(this will happen automatically as needed). Elements of the list
cluster.pars
may be any of the three arguments to
group.knowns
, and will be used as defaults in
subsequent calls to group.knowns
. If not provided, default
values are: dist.method="maximum"
,
hclust.method="complete"
, cut.height=2.5
(if only some
elements of cluster.pars
are provided, the remaining elements
default to the values above). To change values of clustering
parameters in an existing TRAMPknowns
object, use
group.knowns
.
A known contains at most one peak per enzyme/primer combination.
Where a species is known to have multiple TRFLP profiles, these should
be treated as separate knowns with different, unique, knowns.pk
values, but with identical species
values. A sample containing
either pattern will then be recorded as having that species present
(see group.knowns
).
Value
TRAMPknowns |
A new |
labels.TRAMPknowns |
A sorted vector of the unique samples
present in |
summary.TRAMPknowns |
A data.frame, with the size of the peak (if
present) for each enzyme/primer combination, with each known
(indicated by |
Note
Across a TRAMPknowns
object, primer and enzyme names must be
exactly the same (including case and whitespace) to be
considered the same. For example "ITS4"
, "Its4"
,
"ITS 4"
and "ITS4 "
would be considered to be four
different primers.
Factors will not merge correctly (with
combine.TRAMPknowns
or add.known
).
TRAMPknowns
will attempt to catch factor columns and convert
them into characters for the info
and data
data.frames.
Other objects (passed as part of ...
) will not be altered.
See Also
TRAMPsamples
, which constructs an analagous object to
hold “samples” data.
plot.TRAMPknowns
, which creates a graphical
representation of the knowns data.
TRAMP
, for matching unknown TRFLP patterns to
TRAMPknowns
objects.
group.knowns
, which groups similar knowns (generally
called automatically).
add.known
and combine.TRAMPknowns
, which
provide tools for adding knowns from a sample data set and merging
knowns databases.
Examples
## This example builds a TRAMPknowns object from completely artificial
## data:
## The info data.frame:
knowns.info <-
data.frame(knowns.pk=1:8,
species=rep(paste("Species", letters[1:5]), length=8))
knowns.info
## The data data.frame:
knowns.data <- expand.grid(knowns.fk=1:8,
primer=c("ITS1F", "ITS4"),
enzyme=c("BsuRI", "HpyCH4IV"))
knowns.data$size <- runif(nrow(knowns.data), min=40, max=800)
## Construct the TRAMPknowns object:
demo.knowns <- TRAMPknowns(knowns.data, knowns.info, warn.factors=FALSE)
## A plot of the pretend knowns:
plot(demo.knowns, cex=1, group.clusters=TRUE)
TRAMPsamples Objects
Description
These functions create and interact with
TRAMPsamples
objects (collections of TRFLP patterns). Samples
contrast with “knowns” (see TRAMPknowns
) in that
samples contain primarily unidentified profiles. In contrast with
knowns, samples may have many peaks per enzyme/primer combination.
Usage
TRAMPsamples(data, info=NULL, warn.factors=TRUE, ...)
## S3 method for class 'TRAMPsamples'
labels(object, ...)
## S3 method for class 'TRAMPsamples'
summary(object, include.info=FALSE, ...)
Arguments
data |
data.frame containing peak information. |
info |
(Optional) data.frame, describing individual samples (see Details for definitions of both data.frames). If this is omitted, a basic data.frame will be generated. |
warn.factors |
Logical: Should a warning be given if any columns
in |
object |
A |
include.info |
Logical: Should the output be augmented with the
contents of the |
... |
|
Details
The object has at least two components, which relate to each other (in
the sense of a relational database). info
holds information
about the individual samples, and data
holds information about
individual peaks (many of which belong to a single sample).
Column definitions:
-
info
:sample.pk
Unique positive integer, used to identify individual samples (i.e. a “primary key”).
species
Character, giving species name if samples were collected from an identified species. If this column is missing, it will be initialised as
NA
.
-
data
:sample.fk
Positive integer, indicating which sample the peak belongs to (by matching against
info$sample.pk
) (i.e. a “foreign key”).primer
:Character, giving the name of the primer used.
enzyme
:Character, giving the name of the restriction digest enzyme used.
size
Numeric, giving size (in base pairs) of the peak.
height
Numeric, giving the height (arbitrary units) of the peak.
Additional columns are allowed (and ignored) in both data.frames, and
will be retained. This allows notes on data quality and treatments to
be easily included. Additional objects are allowed as part of the
TRAMPsamples
object; any extra objects passed (via
...
) will be included in the final TRAMPsamples
object.
If info
is omitted, then a basic data.frame will be generated,
containing just the unique values of sample.fk
, and
NA
for species
.
Value
TRAMPsamples |
A new |
labels.TRAMPsamples |
A sorted vector of the unique samples
present in |
summary.TRAMPsamples |
A data.frame, with the number of peaks
per enzyme/primer combination, with each sample (indicated by
|
Note
Across a TRAMPsamples
object, primer and enzyme names must be
exactly the same (including case and whitespace) to be
considered the same. For example "ITS4"
, "Its4"
,
"ITS4 "
and "ITS 4"
would be considered to be four
different primers.
Factors will not merge correctly (with
combine.TRAMPsamples
). TRAMPsamples
will attempt
to catch factor columns and convert them into characters for the
info
and data
data.frames. Other objects (passed as
part of ...
) will not be altered.
See Also
plot.TRAMPsamples
and
summary.TRAMPsamples
, for plotting and summarising
TRAMPsamples
objects.
TRAMPknowns
, which constructs an analagous object to
hold “knowns” data.
TRAMP
, for analysing TRAMPsamples
objects.
load.abi
, which creates a TRAMPsamples
object
from Gene Mapper (Applied Biosystems) output.
Absolute Minimum
Description
Returns the signed value of the element with the minimum absolute value in a vector.
Usage
absolute.min(x)
Arguments
x |
Numeric vector ( |
Value
A single value; the value with the smallest absolute value, but with
its original sign. This is equivalent to (and implemented as)
x[which.min(abs(x))]
The value is NA
if x
has no non-NA
values
(c.f. which.min
).
Examples
set.seed(1)
x <- rnorm(16)
min(x) # -2.2147
min(abs(x)) # 0.0444
absolute.min(x) # -0.0444: preserves sign
# NA values OK:
absolute.min(c(-1, 4, NA))
# Slightly unintuitive behaviour:
absolute.min(numeric(0)) # numeric(0)
absolute.min(NA) # NA
Add Knowns To TRAMPknowns Databases
Description
Add a single known or many knowns to a knowns database in a
TRAMPknowns
object. add.known
takes a
TRAMPknowns
object, and adds the peak profile of a
single sample from a TRAMPsamples
object.
combine.TRAMPknowns
combines two TRAMPknowns
objects (similar to combine.TRAMPsamples
).
add.known
and combine
are generic, so if x
argument is a TRAMP
object, then the knowns
component of that object will be updated.
Usage
add.known(x, ...)
## S3 method for class 'TRAMPknowns'
add.known(x, samples, sample.fk, prompt=TRUE, default.species=NULL, ...)
## S3 method for class 'TRAMP'
add.known(x, sample.fk, rebuild=TRUE, ...)
## S3 method for class 'TRAMPknowns'
combine(x, y, rewrite.knowns.pk=FALSE, ...)
## S3 method for class 'TRAMP'
combine(x, y, rebuild=TRUE, ...)
Arguments
x |
A |
samples |
A |
sample.fk |
|
prompt |
Logical: Should the function interactively prompt for a new species name? |
default.species |
Default species name. If |
y |
A second |
rewrite.knowns.pk |
Logical: If the new knowns data contain
|
rebuild |
Logical: should the |
... |
Additional arguments passed to future methods. |
Details
(add.known
only): When adding the profile of a single
individual via add.known
, if more than one peak per
enzyme/primer combination is present we select the most likely profile
by picking the highest peak (largest height
value) for each
enzyme/primer combination (a warning will be given). If two peaks are
of the same height
, then the peak taken is unspecified (similar
to build.knowns
with min.ratio=0
).
(combine
only): rewrite.knowns.pk
provides a
simple way of merging knowns databases that use the same values of
knowns.pk
. Because knowns.pk
must be unique, if
y
(the new knowns database) uses knowns.pk
values
present in x
(the original database), then the knowns.pk
values in y
must be rewritten. This will be done by adding
max(labels(x))
to every knowns.pk
value in
y$info
and knowns.fk
value in y$data
.
If retaining knowns.pk
information is important, we
suggest saving the value of knowns.pk
before running this
function, e.g.
info$knowns.pk.old <- info$knowns.pk
If more control over the renaming process is required, manually adjust
y$info$knowns.pk
yourself before calling this function.
However, by default no translation will be done, and an error will
occur if x
and y
share knowns.pk
values.
For add.known
, only a subset of columns are passed to the
knowns object (a future version may be more inclusive):
From
samples$info
:sample.pk
(asknowns.pk
.)From
samples$data
:sample.fk
(asknowns.fk
),primer
,enzyme
,size
.
For combine
, the data
and info
elements of
the resulting TRAMPknowns
object will have the union of the
columns present in both sets of knowns. If any additional elements
exist as part of the second TRAMPknowns
object (e.g. passed as
...
to TRAMPknowns
when creating y
), these
will be ignored.
Value
An object of the same class as x
: if a TRAMP
object is
supplied, a new TRAMP
object with an updated TRAMPknowns
component will be returned, and if the object is a TRAMPknowns
object an updated TRAMPknowns
object will be returned.
Note
If the TRAMPknowns
object has a file.pat
element (see
TRAMPknowns
), then the new knowns database will be
written to file. This may be confusing when operating on TRAMP
objects directly, since both the TRAMPknowns
object used in the
TRAMP
object and the original TRAMPknowns
object will
share the same file.pat
argument, but contain different data as
soon as add.known
or combine
is used. In short -
be careful! To avoid this issue, either set file.pat
to
NULL
before using add.known
or combine
.
See Also
build.knowns
, which automatically builds a knowns
database, and TRAMPknowns
, which documents the object
containing the knowns database.
combine.TRAMPsamples
, which combines a pair of
TRAMPsamples
objects.
Examples
data(demo.knowns)
data(demo.samples)
## (1) Using add.known(), to add a single known:
## Sample "101" looks like a potential known, add it to our knowns
## database:
plot(demo.samples, 101)
## Add this to a knowns database:
## Because there is more than one peak per enzyme/primer combination, a
## warning will be given. In this case, since there are clear peaks it
## is harmless.
demo.knowns.2 <- add.known(demo.knowns, demo.samples, 101,
prompt=FALSE)
## The known has been added:
demo.knowns.2[101]
try(demo.knowns[101]) # error - known didn't exist in original knowns
## Same, but adding to an existing TRAMP object.
res <- TRAMP(demo.samples, demo.knowns)
plot(res, 101)
res2 <- add.known(res, 101, prompt=FALSE, default.species="New known")
## Now the new known matches itself.
plot(res2, 101)
## (2) Using combine() to combine knowns databases.
## Let's split the original knowns database in two:
demo.knowns.a <- demo.knowns[head(labels(demo.knowns), 10)]
demo.knowns.b <- demo.knowns[tail(labels(demo.knowns), 10)]
## Combining these is easy:
demo.knowns.c <- combine(demo.knowns.a, demo.knowns.b)
## Knowns from both the small database are present in the new one:
identical(c(labels(demo.knowns.a), labels(demo.knowns.b)),
labels(demo.knowns.c))
## Demonstration of knowns rewriting:
demo.knowns.d <- demo.knowns.a
demo.knowns.a$info$from <- "a"
demo.knowns.d$info$from <- "d"
try(combine(demo.knowns.a, demo.knowns.d)) # error
demo.knowns.e <- combine(demo.knowns.a, demo.knowns.d,
rewrite.knowns.pk=TRUE)
## See that both data sets are here (check the "from" column).
demo.knowns.e$info
## Note that a better approach in might be to manually resolve
## conficting knowns.pk values before combining.
Automatically Build Knowns Database
Description
This function uses several filters to select likely knowns, and
construct a TRAMPknowns
object from a
TRAMPsamples
object. Samples are considered to be
“potential knowns” if they have data for an adequate number of
enzyme/primer combinations, and if for each combination they have
either a single peak, or a peak that is “distinct enough” from
any other peaks.
Usage
build.knowns(d, min.ratio=3, min.comb=NA, restrict=FALSE, ...)
Arguments
d |
A |
min.ratio |
Minimum ratio of maximum to second highest peak to accept known (see Details). |
min.comb |
Minimum number of enzyme/primer combinations required for each known (see Details for behaviour of default). |
restrict |
Logical: Use only cases where |
... |
Additional arguments passed to |
Details
For all samples and enzyme/primer combinations, the ratio of the
largest to the second largest peak is calculated. If it is greater
than min.ratio
, then that combination is accepted. If the
sample has at least min.comb
valid enzyme/primer combinations,
then that sample is included in the knowns database. If
min.comb
is NA
(the default), then every
enzyme/primer combination present in the data is required.
Value
A new TRAMPknowns
object. It will generally be neccessary to
edit this object; see read.TRAMPknowns
for details on
how to write, edit, and read back a modified object.
Note
If two peaks have the same height, then using min.ratio=1
will
not allow the entry as part of the knowns database; use
min.ratio=0
instead if this is desired. In this case, the peak
chosen is unspecified.
Note that this function is sensitive to data quality. In particular
split peaks may cause a sample not to be added. These samples may be
manually added using add.known
.
Examples
data(demo.samples)
demo.knowns.auto <- build.knowns(demo.samples, min.comb=4)
plot(demo.knowns.auto, cex=.75)
Value Matching for Data Frames
Description
match
-like classification for data.frames; returns a
vector of row numbers of (first) matches of its first argument in its
second, across shared column names. This is unlikely to be useful to
casual TRAMP
users, but see the final example for a relevant
usage.
Usage
classify(x, table, ...)
Arguments
x |
data.frame: containing columns with the values to be matched. |
table |
data.frame: where all columns contain the values to be matched against. |
... |
Additional arguments to |
Details
As with duplicated.data.frame
, this works by pasting
together a character representation of the rows separated by
\r
(a carriage return), so may be imperfect if the data.frame
has characters with embedded carriage returns or columns which do not
reliably map to characters.
Cases in x
with NA
values in any column shared with
table
will not be matched (and will return the value of
nomatch
). Cases in table
with any NA
values in
any row will match nothing.
All columns in table
must be present in x
, but x
may have additional columns that will be ignored.
Value
A vector of length nrow(x)
, with each element giving the row
number in table
where all elements match across shared
columns.
See Also
match
, on which this is based.
Examples
table <- data.frame(a=letters[1:3], b=rep(1:2, each=3))
x <- cbind(table[sample(nrow(table), 20, TRUE),], x=runif(20))
classify(x, table)
all.equal(table[classify(x, table),], x[names(table)])
## Select only a few cases from a TRAMPsamples data object,
## corresponding with 4 enzyme/primer combinations.
data(demo.samples)
d <- demo.samples$data
use <- expand.grid(primer=c("ITS1F", "ITS4"),
enzyme=c("HpyCH4IV", "BsuRI"))
classify(d, use)
d[!is.na(classify(d, use)),]
Combine Two Objects
Description
This function is used to combine TRAMPsamples
together,
and to combine TRAMPknowns
to TRAMPknowns
or TRAMP
objects. combine
is generic; please see
combine.TRAMPsamples
and
combine.TRAMPknowns
for more information.
Usage
combine(x, y, ...)
Arguments
x , y |
Objects to be combined. See
|
... |
Additional arguments required by methods. |
See Also
See combine.TRAMPsamples
and
combine.TRAMPknowns
for more information.
Combine TRAMPsamples Objects
Description
Combines two TRAMPsamples
objects into one
large TRAMPsamples
object containing all the samples for both
original objects.
Usage
## S3 method for class 'TRAMPsamples'
combine(x, y, rewrite.sample.pk=FALSE, ...)
Arguments
x , y |
|
rewrite.sample.pk |
Logical: If the new sample data ( |
... |
Further arguments passed to or from other methods. |
Details
For a discussion of rewrite.sample.pk
, see the comments on
rewrite.knowns.pk
in the Details of
combine.TRAMPknowns
.
The data
and info
elements of the resulting
TRAMPsamples
object will have union of the columns present in
both sets of samples.
If any additional elements exist as part of the second
TRAMPsamples
object (e.g. passed as ...
to
TRAMPsamples
), these will be ignored with a warning (see
Example).
See Also
combine.TRAMPknowns
, the method for
TRAMPknowns
objects.
Examples
data(demo.samples)
## Let's split the original samples database in two, and recombine.
demo.samples.a <- demo.samples[head(labels(demo.samples), 10)]
demo.samples.b <- demo.samples[tail(labels(demo.samples), 10)]
## Combining these is easy:
demo.samples.c <- combine.TRAMPsamples(demo.samples.a, demo.samples.b)
## There is a warning message because demo.samples.b contains extra
## elements:
names(demo.samples.b)
## In this case, these objects should not be combined, but in other
## cases it may be necessary to rbind() the extra objects together:
## Not run:
demo.samples.c$soilcore <- rbind(demo.samples.a$soilcore,
demo.samples.b$soilcore)
## End(Not run)
## This must be done manually, since there is no way of telling what
## should be done automatically. Ideas/contributions are welcome here.
Calculate Matrix of Distances between Peaks
Description
Generate an array of goodness-of-fit (or distance) between samples and knowns based on the sizes (in base pairs) of TRFLP peaks. For each sample/known combination, and for each enzyme/primer combination, this calculates the minimum distance between any peak in the sample and the single peak in the known.
Usage
create.diffsmatrix(samples, knowns)
Arguments
samples |
A |
knowns |
A |
Details
This function will rarely need to be called directly, but does most of
the calculations behind TRAMP
, so it is useful to
understand how this works.
This function generates a three-dimensional s \times k \times
n
matrix of the (smallest, see below) distance in base
pairs between peaks in a collection of unknowns (run data) and a
database of knowns for several enzyme/primer combinations. s
is
the number of different samples in the samples data
(length(labels(samples))
), k
is the number of different
types in the knowns database (length(labels(knowns))
), and
n
is the number of different enzyme/primer combinations. The
enzyme/primer combinations used are all combinations present in the
knowns database; combinations present only in the samples will be
ignored. Not all samples need contain all enzyme/primer combinations
present in the knowns.
In the resulting array, m[i,j,k]
is the difference (in base
pairs) between the i
th sample and the j
th known for the
k
th enzyme/primer combination. The ordering of the n
enzyme/primer combinations is arbitrary, so a data.frame of
combinations is included as the attribute enzyme.primer
, where
enzyme.primer$enzyme[k]
and enzyme.primer$primer[k]
correspond to enzyme and primer used for the distances in
m[,,k]
.
Each case in the knowns database has a single (or no) peak for each
enzyme/primer combination, but each sample may contain multiple peaks
for an enzyme/primer combination; the difference is always the
smallest distance from the sample to the known peak. Where a sample
and/or a known lacks an enzyme/primer combination, the value of the
difference is NA
. The smallest absolute distance is
taken between sample and known peaks, but the sign of the difference
is preserved (negative where the closest sample peak was less than the
known peak, positive where greater; see absolute.min
).
Value
A three-dimensional matrix, with an attribute enzyme.primer
,
described above.
See Also
TRAMP
, which uses output from
create.diffsmatrix
.
Examples
data(demo.samples)
data(demo.knowns)
s <- length(labels(demo.samples))
k <- length(labels(demo.knowns))
n <- nrow(unique(demo.knowns$data[c("enzyme", "primer")]))
m <- create.diffsmatrix(demo.samples, demo.knowns)
dim(m)
identical(dim(m), c(s, k, n))
## Maximum error for each sample/known (i.e. across all enzyme/primer
## combinations), similar to how calculated by \link{TRAMP}
error <- apply(abs(m), 1:2, max, na.rm=TRUE)
dim(error)
## Euclidian error (see ?\link{TRAMP})
error.euclid <- sqrt(rowSums(m^2, TRUE, 2))/rowSums(!is.na(m), dims=2)
## Euclidian and maximum error will require different values of
## accept.error in TRAMP:
plot(error, error.euclid, pch=".")
Demonstration Knowns Database
Description
A knowns database, for demonstrating the TRAMPR
package.
This is a subset of a full knowns database, and not intended to
represent any real data set, and should not be assumed to be
accurate.
The data are stored as a TRAMPknowns
object. Columns in
the info
and data
components are described on the
TRAMPknowns
page.
Usage
data(demo.knowns)
Licence
This data set is provided under a Creative Commons “Attribution-NonCommercial-NoDerivs 2.5” licence. Please see https://creativecommons.org/licenses/by-nc-nd/2.5/ for details.
Demonstration Samples Database
Description
A samples database, for demonstrating the TRAMPR
package.
This is a subset of a full samples database, is not intended to
represent any real data set, and should not be assumed to be
accurate.
The data are stored as a TRAMPsamples
object. Columns in
the info
and data
components are described on the
TRAMPsamples
page, but with some additions:
-
info
:-
soilcore.fk
: Key to the soil core from which a sample came. Seesoilcore
, below.
-
-
data
:-
sample.file.name
: Original.fsa
file corresponding to the TRFLP run. This is included in allTRAMPsamples
objects created byload.abi
.
-
-
soilcore
: Adata.frame
with information about the soilcore from which samples came.-
soilcore.pk
: Key, distinguishing soil cores. -
plot
: Plot number (1 to 10). -
elevation
: Height above mean sea level, in metres. -
east
: Easting (New Zealand Map Grid/NZMG). -
north
: Northing (NZMG). -
vegetation
: Vegetation type (Nothofagus solandri
orPinus contorta
).
-
Usage
data(demo.samples)
Format
A TRAMPsamples
object.
Licence
This data set is provided under a Creative Commons “Attribution-NonCommercial-NoDerivs 2.5” licence. Please see https://creativecommons.org/licenses/by-nc-nd/2.5/ for details.
Knowns Clustering
Description
Group a TRAMPknowns
object so that knowns
with similar TRFLP patterns and knowns that share the same species
name “group” together. In general, this function will be called
automatically whenever appropriate (e.g. when loading a data set or
adding new knowns). Please see Details to understand why this
function is necessary, and how it works.
The main reason for manually calling group.knowns
is to change
the default values of the arguments; if you call group.knowns
on a TRAMPknowns
object, then any subsequent automatic call to
group.knowns
will use any arguments you passed in the
manual group.knowns
call (e.g. after doing
group.knowns(x, cut.height=20)
, all future groupings will use
cut.height=20
).
Usage
group.knowns(x, ...)
## S3 method for class 'TRAMPknowns'
group.knowns(x, dist.method, hclust.method, cut.height, ...)
## S3 method for class 'TRAMP'
group.knowns(x, ...)
Arguments
x |
A |
dist.method |
Distance method used in calculating similarity
between different knowns (see |
hclust.method |
Clustering method used in generating clusters
from the similarity matrix (see |
cut.height |
Passed to |
... |
Arguments passed to further methods. |
Details
group.knowns
groups together knowns in a
TRAMPknowns
object based on two criteria: (1) TRFLP
profiles that are very similar across shared enzyme/primer
combinations (based on clustering) and (2) TRFLP profiles that belong
to the same species (i.e. share a common species
column in the
info
data.frame of x
; see TRAMPknowns
for
more information). This is to solve three issues in TRFLP analysis:
The TRFLP profile of a single species can have variation in peak sizes due to DNA sequence variation. By including multiple collections of each species, variation in TRFLP profiles can be accounted for. If a
TRAMPknowns
object contains multiple collections of a species, these will be aggregated bygroup.knowns
. This aggregation is essential for community analysis, as leaving individual collections will artificially inflate the number of “present species” when runningTRAMP
.Some authors have taken an alternative approach by using a larger tolerance in matching peaks between samples and knowns (effectively increasing
accept.error
inTRAMP
) to account for within-species variation. This is not recommended, as it dramatically increases the risk of incorrect matches.Distinctly different TRFLP profiles may occur within a species (or in some cases within an individual); see Avis et al. (2006).
group.knowns
looks at thespecies
column of theinfo
data.frame ofx
and joins any knowns with identicalspecies
values as a group. This can also be used where multiple profiles are present in an individual.Different species may share a similar TRFLP profile and therefore be indistinguishable using TRFLP. If these patterns are not grouped, two species will be recorded as present wherever either is present.
group.knowns
prevents this by joining knowns with “very similar” TRFLP patterns as a group. Ideally, these problematic groups can be resolved by increasing the number of enzyme/primer pairs in the data.
Groups names are generated by concatenating all unique (sorted) species names together, separated by commas.
To determine if knowns are “similar enough” to form a group, we
use R's clustering tools: dist
, hclust
and cutree
. First, we generate a distance matrix of the
knowns profiles using dist
, and using method
dist.method
(see Example below; this is very similar to what
TRAMP
does, and dist.method
should be specified
accordingly). We then generate clusters using hclust
,
and using method hclust.method
, and “cut” the tree at
cut.height
using cutree
.
Knowns are grouped together iteratively; so that all groups sharing a common cluster are grouped together, and all knowns that share a common species name are grouped together. In certain cases this may chain together seemingly unrelated groups.
Because group.knowns
is generic, it can be run on either a
TRAMPknowns
or a TRAMP
object. When run
on a TRAMP
object, it updates the TRAMPknowns
object
(stored as x$knowns
), so that subsequent calls to
plot.TRAMPknowns
or summary.TRAMPknowns
(for example) will use the new grouping parameters.
Parameters set by group.knowns
are retained as part of the
object, so that when adding additional knowns (add.known
and combine
), or when subsetting a knowns database (see
[.TRAMPknowns
,
aka TRAMPindexing
), the same grouping parameters will be
used.
Value
For group.knowns.TRAMPknowns
, a new TRAMPknowns
object.
The cluster.pars
element will have been updated with new
parameters, if any were specified.
For group.knowns.TRAMP
, a new TRAMP
object, with an
updated knowns
element. Note that the original
TRAMPknowns
object (i.e. the one from which the TRAMP
object was constructed) will not
be modified.
Warning
Warning about missing data: where there are NA
values in
certain combinations, NA
s may be present in the final distance
matrix, which means we cannot use hclust
to generate the
clusters! In general, NA
values are fine. They just can't be
everywhere.
References
Avis PG, Dickie IA, Mueller GM 2006. A ‘dirty’ business: testing the limitations of terminal restriction fragment length polymorphism (TRFLP) analysis of soil fungi. Molecular Ecology 15: 873-882.
See Also
TRAMPknowns
, which describes the TRAMPknowns
object.
build.knowns
, which attempts to generate a knowns
database from a TRAMPsamples
data set.
plot.TRAMPknowns
, which graphically displays the
relationships between knowns.
Examples
data(demo.knowns)
data(demo.samples)
demo.knowns <- group.knowns(demo.knowns, cut.height=2.5)
plot(demo.knowns)
## Increasing cut.height makes groups more inclusive:
plot(group.knowns(demo.knowns, cut.height=100))
res <- TRAMP(demo.samples, demo.knowns)
m1.ungrouped <- summary(res)
m1.grouped <- summary(res, group=TRUE)
ncol(m1.grouped) # 94 groups
res2 <- group.knowns(res, cut.height=100)
m2.ungrouped <- summary(res2)
m2.grouped <- summary(res2, group=TRUE)
ncol(m2.grouped) # Now only 38 groups
## group.knowns results in the same distance matrix as produced by
## TRAMP, therefore using the same method (e.g. method="maximum") is
## important. The example below shows how the matrix produced by
## dist(summary(x)) (as calculated by group.knowns) is the same as that
## produced by TRAMP:
f <- function(x, method="maximum") {
## Create a pseudo-samples object from our knowns
y <- x
y$data$height <- 1
names(y$info)[names(y$info) == "knowns.pk"] <- "sample.pk"
names(y$data)[names(y$data) == "knowns.fk"] <- "sample.fk"
class(y) <- "TRAMPsamples"
## Run TRAMP, clean up and return
## (If method != "maximum", rescale the error to match that
## generated by dist()).
z <- TRAMP(y, x, method=method)
if ( method != "maximum" ) z$error <- z$error * z$n
names(dimnames(z$error)) <- NULL
z
}
g <- function(x, method="maximum")
as.matrix(dist(summary(x), method=method))
all.equal(f(demo.knowns, "maximum")$error, g(demo.knowns, "maximum"))
all.equal(f(demo.knowns, "euclidian")$error, g(demo.knowns, "euclidian"))
all.equal(f(demo.knowns, "manhattan")$error, g(demo.knowns, "manhattan"))
## However, TRAMP is over 100 times slower in this special case.
system.time(f(demo.knowns))
system.time(g(demo.knowns))
Load ABI Output Files
Description
These functions help convert data from Applied Biosystems
Gene Mapper (ABI) output format into TRAMPsamples
objects for analysis. Note that this operates on the summarised
output (a text file), rather than the .fsa
files containing
data for individual runs.
Details of the procedure of this function are given below, and a
worked example is given in the package vignette; type
vignette("TRAMPRdemo")
to view it.
The function peakscanner.to.genemapper
is an experimental
function to convert from peakscanner output to abi genemapper output.
The peakscanner output is very slightly different in format, and
currently load.abi
is very fussy about the input file's
structure. Eventially load.abi
will be made more tolerant, but
as an interim solution, run peakscanner.to.genemapper
on your
file. By default, running
peakscanner.to.genemapper(myfile.csv)
will produce a file
myfile.txt
. This can then be loaded using load.abi
as
described below, specifying myfile.txt
as the file
argument.
Usage
load.abi(file, file.template, file.info, primer.translate, ...)
load.abi.create.template(file, file.template)
load.abi.create.info(file, file.template, file.info)
peakscanner.to.genemapper(filename, output)
Arguments
file |
The name of the file from which the ABI data are to be read from. |
file.template |
The name of the file containing the “template” file (see Details). |
file.info |
(Optional) the name of the file containing extra information associated with each sample (see Details). |
primer.translate |
List used to translate dye codes into primers. The same codes are assumed to apply across the whole file. See Details for format. |
... |
Additional objects to incorportate
into a |
filename |
In |
output |
In |
Details
Some terminology: a “sample” refers to a physical sample
(e.g. a root tip), while a “run” refers to an individual
TRFLP run (i.e. one enzyme and one primer). Because two primers are
run at once, each “runfile” contains information on two
“runs”, but each “sample” may contain more than one
“runfile”. Runfiles are distinguished by different
sample.file.name
values in the ABI file, while different
samples are distinguished by different
sample.fk
/sample.pk
values.
primer.translate
is a list used to translate between the dyes
recorded in the ABI file and the primers used. Each element
corresponds to a different primer, and is a vector of different colour
dyes. The list:
list(ITS1F="B", ITS4="G")
would translate all dyes with the value "B"
to "ITS1F"
,
and all dyes with the value "G"
to "ITS4"
. The list:
list(ITS1F="B", ITS4=c("G", "Y"))
would do the same, except that both "G"
and "Y"
dyes would be converted to "ITS4"
. If a dye is used in the
data that is not represented within primer.translate
, then it
will be excluded (e.g., all rows of data with dye
as
"R"
will be excluded).
The procedure for loading in ABI data is:
Create the “template” file. Template files are required to record which enzymes were used for each run, since that is not included in the ABI output, and to group together separate runs (typically different enzymes) that apply to the same individual. The function
load.abi.create.template
will create a template that contains all the unique file names found in the ABI file (assample.file.name
), and blank columns titledenzyme
andsample.index
. Runningload.abi.create.template(x)
where
x
is the name of your ABI file will create a template file in the same directory as the ABI file. The function will print the name and location of the template file to the console.Edit the template file and save. The
enzyme
andsample.index
columns are initially empty and need filling in, which can be done in Excel, or another spreadsheet program. Thesample.index
column linkssample.file.name
back to an individual sample; multiplesample.file.name
s that sharesample.index
values come from the same individual sample. (If editing with Excel, ignore all the warnings about incompatible file formats when saving.)sample.index
should be a positive integer (but see Note below).Optionally create an “info” file, which is useful if you want to associate extra information against your samples. The function
load.abi.create.info
will create an info file that contains all the unique values ofsample.index
, and an empty column titledspecies
. Thespecies
column can be filled in where the species is known (e.g. from collections of sporocarps). Any additional columns may be added. Runningload.abi.create.info(x)
where
x
is the name of your ABI file will create an info file in the same directory as the ABI file. The function will print the name and location of the info file to the console. Edit and save this file.Create the
TRAMPsamples
object by runningload.abi
. This loads your ABI data, plus the new template file, plus an optional information file. Runningmy.samples <- load.abi(x, primer.translate=primer.translate)
will create an object “
my.samples
” containing your data.
By default, the filenames of the template and info files will be
automatically generated: <prefix>.<ext>
becomes
<prefix>_template.csv
or <prefix>_info.csv
. If you
choose to specify file.template
or file.info
manually
when running load.info.create.template
or
load.info.create.info
, you must use the same values of
file.template
and file.info
when running
load.abi
.
Warning
Do not change the names of any columns produced by
load.abi.create.template
or load.abi.create.info
.
Note
There is no reason that data from other types of output files could
not be manually imported using TRAMPsamples
. We welcome
contributions for other major data formats.
When creating sample.index
values, these should be positive
integers. If you enter strings (e.g. a1
, b1
), these
will be automatically converted into integers. Once loaded,
sample.pk
/sample.fk
is always a positive integer key,
but sample.index
will be retained as your string keys.
See Also
read.abi
, which reads in ABI data with few
modifications.
TRAMPsamples
, which documents the data type produced by
load.abi
.
The package vignette, which includes a worked example of loading data
using these functions; to locate the vignette, type
help(library=TRAMPR)
, and scroll to the bottom of the page, or
type: system.file("doc/TRAMPR_demo.pdf", package="TRAMPR")
.
Plot a TRAMP Object
Description
Creates a graphical representation of matches performed by
TRAMP
. The plot displays (1) “matches”, showing
how samples match the knowns and (2) “peak profiles”, showing
the locations of peaks for individual enzyme/primer combinations.
Usage
## S3 method for class 'TRAMP'
plot(x, sample.fk, ...)
TRAMP.plotone(x, sample.fk, grouped=FALSE, ignore=FALSE,
all.knowns=TRUE, all.samples=FALSE,
all.samples.global=FALSE, col=1:10,
pch=if (grouped) 15 else 16, xmax=NULL, horiz.lines=TRUE,
mar.default=.5, p.top=.5, p.labels=1/3, cex.axis=NULL,
cex.axis.max=1)
Arguments
x |
A |
sample.fk |
The |
grouped |
Logical: Should the matched knowns be grouped? |
ignore |
Logical: Should matches marked as ignored by
|
all.knowns , all.samples , all.samples.global |
Controls which enzyme/primer combinations are displayed (see Details) |
col |
Vector of colours to plot the different enzyme/primer combinations. There must be at least as many colours as there are different combinations. |
pch |
Plotting symbol to use (see |
xmax |
Maximum size (in base pairs) for the plots to cover.
|
horiz.lines |
Logical: Should horizontal grid lines be used for each matched known? |
The following arguments control the layout and margins of the plot:
mar.default |
Margin size (in lines of text) to surround the plot. |
p.top |
Proportion of the plotting area to be used for the
“matches”. The “peak profiles” will share the bottom
|
p.labels |
Proportion of the plotting area to be used for labels
to the left of the plots. |
cex.axis |
Size of the text used for axes. If |
cex.axis.max |
Maximum size of the text used for axes, if
automatically determining the label size (i.e. |
... |
Additional arguments passed to |
Details
This constructs a plot of a TRAMP
fit, illustrating
where knowns match the sample data, and which sample peaks remain
unmatched.
The top portion of the plot displays “matches”, showing
how samples match the knowns. Individual species (or groups if
grouped
is TRUE
) are represented by different horizontal
lines. Where the sample matches a particular known, a symbol is drawn
(Beware: it may look like only one symbol is drawn when several
symbols are plotted on top of one another).
The bottom portion of the plot displays the “peak profile” of
the sample, showing the locations and heights of peaks for
various enzyme/primer combinations (the exact combination depends on
the values of all.knowns
, all.samples
and
all.samples.global
; see below). The height is arbitrary, so
units are ommited.
The arguments all.knowns
, all.samples
and
all.samples.global
control which enzyme/primer combinations are
displayed in the plot. all.knowns=TRUE
displays all
combinations present in the knowns database and
all.samples=TRUE
displays all combinations present in the
samples; when all.samples.global=TRUE
this is combinations
across the entire samples data set, otherwise this is samples present
in the current sample only. At least one of all.knowns
and all.samples
must be TRUE
.
Note
While TRAMP.plotone
does the actual plot, it should not be
called directly; please use plot(x, sample.fk, ...)
.
See Also
plot.TRAMPknowns
, for plotting TRAMPknowns
objects, and plot.TRAMPsamples
, for plotting
TRAMPsamples
objects.
Examples
data(demo.samples)
data(demo.knowns)
res <- TRAMP(demo.samples, demo.knowns)
plot(res, 101)
plot(res, 110)
plot(res, 117)
plot(res, 117, grouped=TRUE)
## Not run:
# Create a PDF file with all matches:
pdf("all_matches.pdf")
plot(res)
dev.off()
## End(Not run)
Summary Plot of Knowns Data
Description
Creates a plot showing the clustering and profiles of a
TRAMPknowns
object (a “knowns database”). The
plot has three vertical panels;
The leftmost contains a dendrogram, showing how similar the profiles of knowns are (see
group.knowns
for details).The rightmost displays the TRFLP profile for each individual (with a different colour symbol for each different enzyme/primer combination).
The middle panel displays information on the species names and groups of the knowns.
Usage
## S3 method for class 'TRAMPknowns'
plot(x, cex=1, name="species", pch=1, peaks.col, p=.02,
group.clusters=TRUE, groups.col=1:4, grid.by=5, grid.col="gray",
widths=c(1, 2, 1), ...)
Arguments
x |
A |
cex |
Character size for the plot. Because knowns databases can be large, this should be small and may need to be adjusted. Most aspects of the plot will scale with this. |
name |
Column name to use when generating species names; must be
one of |
pch |
Plotting symbol to use for peaks in the peak profiles. |
peaks.col |
Vector of colours to plot the different enzymes in
the peak profiles. These will be used in the order of the columns
of |
p |
Scaling factor for the middle plot; this specifies the
proportion of the width that elements are spaced horizontally from
one another. Columns of text are |
group.clusters |
Logical: Should groups of clusters (determined
by |
groups.col |
Vector of colours to plot different group clusters in. This will be recycled as neccessary. |
grid.by |
Interval between horizontal grid lines. Grid lines
start at |
grid.col |
Colour of the horizontal grid lines. |
widths |
Relative widths of the three panels of the plot (see
|
... |
Additional arguments (ignored). |
Note
In general, there will probably be too many knowns to make a legible plot when displayed on the screen. We recommend creating a PDF of the plot and viewing that instead (see Example).
When plotted on the interactive plotting device, if the plot is resized, the plot is likely to look strange.
See Also
group.knowns
, which controls the grouping of
knowns, and TRAMPknowns
, which constructs
TRAMPknowns
objects.
Examples
data(demo.knowns)
plot(demo.knowns)
## Not run:
pdf("knowns_summary.pdf", paper="default", width=8, height=11)
plot(demo.knowns)
plot(demo.knowns, group.clusters=FALSE)
dev.off()
## End(Not run)
Plot a TRAMPsamples Object
Description
Shows the peak profiles of samples in a
TRAMPsamples
object, showing
the locations and heights of peaks for individual enzyme/primer
combinations. This is the same information that is displayed in the
bottom portion of a plot.TRAMP
plot, but may be useful
where a TRAMP
fit has not been performed yet
(e.g. before a knowns database has been constructed).
Usage
## S3 method for class 'TRAMPsamples'
plot(x, sample.fk, ...)
TRAMPsamples.plotone(x, sample.fk, all.samples.global=FALSE, col=1:10,
xmax=NULL, mar.default=.5, mar.labels=8, cex=1)
Arguments
x |
A |
sample.fk |
The |
all.samples.global |
Logical: Should plots be set up for all
enzyme/primer combinations present in |
col |
Vector of colours to plot the different enzyme/primer combinations. There must be at least as many colours as there are different combinations. |
xmax |
Maximum size (in base pairs) for the plots to cover.
|
mar.default |
Margin size (in lines of text) to surround the plot. |
mar.labels |
Number of lines of text to be used for labels to the left of the plots. Increase this if labels are being truncated. |
cex |
Scaling factor for text. |
... |
Additional arguments (ignored). |
See Also
plot.TRAMP
, the plotting method for TRAMP
objects, and plot.TRAMPknowns
, for
TRAMPknowns
objects.
Examples
data(demo.samples)
plot(demo.samples, 101)
plot(demo.samples, 117)
## Not run:
# Create a PDF file with all profiles:
pdf("all_profiles.pdf")
plot(demo.samples)
dev.off()
## End(Not run)
Read ABI Output Files
Description
Read an Applied Biosystems Gene Mapper (ABI) output file, and prepare for analysis.
Note that this operates on the summarised output (a text file), rather
than the .fsa
files containing data for individual runs.
Usage
read.abi(file)
Arguments
file |
The name of the file from which the data are to be read. |
Details
The ABI file format contains a few features that make it difficult to
interact with directly, so read.abi
provides a wrapper around
read.table
to work around these. The three issues are
(1) trailing tab characters, (2) mixed case and punctuation in column
names, and (3) parsing the “Dye/Sample Peak” column.
Because each line of an ABI file contains a trailing tab character
(\t
), read.table
fails to read the file
correctly. read.abi
renames all columns so that
non-alphanumeric characters all become periods, and all uppercase
letters are converted to lower case.
The column Dye/Sample Peak
contains data of the form
<Dye>,<Sample Peak>
, where <Dye>
is a code for the dye
colour used and <Sample Peak>
is an integer indicating the
order of the peaks. Entries where the contents of Dye/Sample
Peak
terminates in a "*"
character (indicating an internal
size standard) are automatically excluded from the analysis.
The final column names are:
-
sample.file.name
: Name of the file containing data. -
size
: Size of the peak (in base pairs). -
height
: Height of the peak (arbitrary units). -
dye
: Code for dye used. -
sample.peak
: Rank of peak within current sample.
In addition, other column names may be retained from ABI output, but not used.
Note
There is no reason that data from other types of output files could
not be manually imported using TRAMPsamples
. We welcome
contributions for other major data formats.
See Also
load.abi
, which attempts to construct a
TRAMPsamples
object from an ABI file (with a bit of user
intervention).
Read/Write TRAMPknowns and TRAMPsamples Objects
Description
Saves and loads TRAMPknowns
and
TRAMPsamples
objects as a series of “csv” (comma
separated value) files for external editing.
If you do not want to edit your data, then saving with
save
is preferable; it is faster, creates smaller files,
and will save any additional components in the objects (see Examples).
Usage
read.TRAMPknowns(file.pat, auto.save=TRUE, overwrite=FALSE)
write.TRAMPknowns(x, file.pat=x$file.pat, warn=TRUE)
read.TRAMPsamples(file.pat)
write.TRAMPsamples(x, file.pat)
Arguments
x |
A |
file.pat |
Pattern, with the filename prefix: “info” and
“data” objects will be read/written as
|
auto.save |
Logical: Should
where |
overwrite |
Should previous backup files be overwritten when creating new backups? |
warn |
Should the function warn when no filename is given?
(Because this function is called automatically when adding new
knowns, and because |
Details
file.pat
may contain a path. It is best to use forward slashes
as directory separators (path/to/file
), but on Windows (only),
double backslashes will also work (path\\to\\file
).
Paths may be either relative (e.g. path/to/file
), or absolute
(e.g. /path/to/file
, or x:/path/to/file
on Windows).
See Also
load.abi
, for semi-automatic loading of ABI output
files.
save
and load
, for saving and loading of
arbitrary R objects.
Examples
## Not run:
# Preferred way of saving/loading objects, if editing is not required:
save(demo.knowns, file="my_knowns.Rdata")
# (possibly in a different session, but _after_ loading TRAMP)
load("my_knowns.Rdata") # -> creates 'demo.knowns' in global environment
## End(Not run)
Rebuild a TRAMP Object
Description
This function rebuilds a TRAMP
object. Typically this will be
called automatically after adding knowns (see
add.known
); there should be little need to call this
manually. The same parameters that were used in the
original call to TRAMP
are used again, and these cannot
currently be modified during this call.
Usage
rebuild.TRAMP(x)
Arguments
x |
A |
Value
A new TRAMP
object, with all components recalculated.
Mark a TRAMP Match as Ignored
Description
Mark a match in a TRAMP object as ignored; when this is
set, a match will be ignored when producing presence/absence matrices
(see summary.TRAMP
) or when plotting
(plot.TRAMP
) when ignore
is TRUE
.
update.TRAMP
provides an interactive interface for doing
this, but remove.TRAMP.match
may be useful directly.
Usage
remove.TRAMP.match(x, sample.fk, knowns.fk)
Arguments
x |
A |
sample.fk , knowns.fk |
Key of sample and known, respectively.
See |
Value
A modified TRAMP
object.
Warning
This should be regarded as experimental. There is currently no
mechanism for restoring ignored matches, aside from recreating the
TRAMP
object, or through editing x$presence.ign
directly
(the format of that table is self-explanatory, but is not guaranteed
not to change between TRAMP versions). Note that by default,
summary.TRAMP
and plot.TRAMP
will not
remove matches; you must specify ignore=TRUE
to enable this.
Note
This function returns a modified object - the TRAMP
object is
not modified in place. You must do:
x <- remove.TRAMP.match(x, sample.fk, knowns.fk)
to mark a match as ignored in the object x
.
Create Presence/Absence Matrices from TRAMP Objects
Description
Generate a summary of a TRAMP
object, by producing a
presence/absence matrix. This is the preferred way of extracting the
presence/absence matrix from a TRAMP
object, and allows for
grouping, naming knowns, and ignoring matches (specified by
remove.TRAMP.match
).
Usage
## S3 method for class 'TRAMP'
summary(object, name=FALSE, grouped=FALSE, ignore=FALSE, ...)
Arguments
object |
A |
name |
Logical: Should the knowns be named? |
grouped |
Logical: Should the knowns be grouped? |
ignore |
Logical: Should matches marked as ignored be excluded? |
... |
Further arguments passed to or from other methods. |
Value
A presence/absence matrix, with samples as rows
and knowns as columns. If name
is TRUE
, then names of
knowns (or groups of knowns) are used, otherwise the knowns.fk
is used (group.strict
if grouped). If grouped
is
TRUE
, then the knowns are collapsed by group (using
group.strict
; see group.knowns
). A group is
present if any of the knowns belonging to it are present. If
ignore
is TRUE
, then any matches marked by
remove.TRAMP.match
are excluded.
Examples
data(demo.knowns)
data(demo.samples)
res <- TRAMP(demo.samples, demo.knowns)
head(summary(res))
head(summary(res, name=TRUE))
head(summary(res, name=TRUE, grouped=TRUE))
## Extract the species richness for each sample (i.e. the number of
## knowns present in each sample)
rowSums(summary(res, grouped=TRUE))
## Extract species frequencies and plot a rank abundance diagram:
## (i.e. the number of times each known was recorded)
sp.freq <- colSums(summary(res, name=TRUE, grouped=TRUE))
sp.freq <- sort(sp.freq[sp.freq > 0], decreasing=TRUE)
plot(sp.freq, xlab="Species rank", ylab="Species frequency", log="y")
text(1:2, sp.freq[1:2], names(sp.freq[1:2]), cex=.7, pos=4, font=3)
Interactively Alter a TRAMP Object
Description
This function allows some manual checking and correction of
a TRAMP
object. By default, it steps through each
sample, and offers to (1) add a new known to the
TRAMPknowns
database within the TRAMP
object (see
add.known
for details), (2) mark matches to be ignored
in future calls to plot.TRAMP
(see
remove.TRAMP.match
), (3) save the current plot as a
PDF.
Usage
## S3 method for class 'TRAMP'
update(object, sample.fk=labels(object$samples), grouped=FALSE,
ignore=TRUE, delay.rebuild=FALSE, default.species=NULL,
filename.fmt="TRAMP_%d.pdf", ...)
Arguments
object |
A |
sample.fk |
A vector of |
grouped , ignore |
Plotting parameters, as in
|
delay.rebuild |
Logical: Should the rebuild of the |
default.species |
Default species name for newly added knowns.
Passed to |
filename.fmt |
Format used to generate filenames when saving
PDFs. Include a |
... |
Further arguments passed to the plotting function
|
Warning
If an error occurs while running update
, all modifications will
be lost.
Note
update.TRAMP
returns a modified TRAMP
object, and does
not modify the original TRAMP
object in place. You must use it
like:
x <- update(x)
or
x2 <- update(x)
to modify the original object or create a new, modified object in
place. Note that if creating mutiple objects, if the
TRAMPknowns
object has a file.pat
element, then
any changes to either of x
or x2
will be written back to
file, but the knowns contained in x
and x2
may be
different. See the note in add.known
.
The action “Quit” will always exit the update
function and
save the object.
Be careful when using a TRAMP
object whose TRAMPknowns
element has a file.pat
element; new knowns added will be
immediately written to file.
Examples
## Since this function runs interactively, there can be no sample.