Version: | 0.0.12 |
Date: | 2024-04-23 |
Title: | Interfaces to Phylogenetic Software in R |
Depends: | R (≥ 3.5.0), ape |
Imports: | data.table, phangorn, plyr, XML |
Description: | Functions that wrap popular phylogenetic software for sequence alignment, masking of sequence alignments, and estimation of phylogenies and ancestral character states. |
License: | GPL-3 |
Language: | en-US |
Encoding: | UTF-8 |
NeedsCompilation: | no |
Repository: | CRAN |
RoxygenNote: | 7.3.1 |
Packaged: | 2024-04-23 19:15:51 UTC; christoph |
Author: | Christoph Heibl [aut, cre], Natalie Cusimano [aut], Franz-Sebastian Krah [aut] |
Maintainer: | Christoph Heibl <christoph.heibl@gmx.net> |
Date/Publication: | 2024-04-23 19:50:02 UTC |
Interfaces to Phylogenetic Software
Description
This package presents a set of functions that were formerly included in the phyloch package and which wrap popular phylogenetic software for sequence alignment, masking of sequence alignments, and estimation of phylogenies and ancestral character states.
Details
Package: | ips |
Type: | Package |
Version: | 0.0.12 |
Date: | 2019-11-14 |
License: | GPL (>= 2) |
There are several functions for reading and writing DNA sequences in FASTA, PHYLIP, and NEXUS format: read.fas
, read.phy
, read.nex
, write.fas
, write.phy
, and write.nex
.
Some functions are available for integrating BEAST with R. XML input files for BEAST can be generated with rbeauti
. Two functions are designed to read TreeAnnotator output: read.beast
will render an object of class phylo
with additional node statistics appended as list elements. These additional node statistics will be lost be the subsequent use of ladderize
or rotate
(or similar functions that change the ordering of internal nodes).read.beast.table
also parses the TreeAnnotator output, but returns a matrix of node statistics.
This package itself does not implement techniques for phylogenetic analyses, but provides a series of wrappers for commonly used software packages. Sequence alignment can be done with the mafft
and prank
; cleaning of sequences with gblocks
and aliscore
. The function raxml
and mrbayes
are intended for phylogenetic tree search. Running mrbayes
with argument run = FALSE
can be used to create MrBayes-executable NEXUS files. Finally, wrappers is provided for Multistate
in the BayesTraits
package (see multistateML
and multistateMCMC
).
Several plotting functions (HPDbars
, clade.bars
, box.clades
, box.tips
, tip.color
, edge.color
have been moved to the viper package.
Author(s)
Natalie Cusimano, Christoph Heibl, Franz-Sebastian Krah, Maintainer: Christoph Heibl (christoph.heibl@gmx.net)
See Also
Conversion of DNAbin to Index
Description
Extract the indices of non-empty positions in a sample of DNA sequences to
Usage
DNAbin2index(x)
Arguments
x |
A matrix of class |
See Also
Identify/Delete Spurious Rows and Columns from DNA Alignments
Description
After subsetting (see e.g. DNAbin
), DNA sequence
alignments can contain rows and columns that consist entirely of missing
and/or ambiguous character states. identifyEmptyCells
will identify
and deleteEmptyCells
will delete all such rows (taxa) and columns
(characters) from a DNA sequence alignment.
Usage
deleteEmptyCells(
DNAbin,
margin = c(1, 2),
nset = c("-", "n", "?"),
quiet = FALSE
)
identifyEmptyCells(
DNAbin,
margin = c(1, 2),
nset = c("-", "n", "?"),
quiet = FALSE
)
Arguments
DNAbin |
An object of class |
margin |
A vector giving the subscripts the function will be applied
over: |
nset |
A vector of mode character; rows or columns that consist
only of the characters given in |
quiet |
Logical: if set to |
Details
For faster execution, deleteEmptyCells
handles sequences in
ape's bit-level coding scheme.
Value
An object of class DNAbin
.
References
Cornish-Bowden, A. 1984. Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. 13: 3021–3030.
See Also
Examples
# COX1 sequences of bark beetles
data(ips.cox1)
# introduce completely ambiguous rows and colums
x <- as.character(ips.cox1[1:6, 1:60])
x[3, ] <- rep("n", 60)
x[, 20:24] <- rep("-", 6)
x <- as.DNAbin(x)
image(x)
# identify those rows and colums
(id <- identifyEmptyCells(x))
xx <- x[-id$row, -id$col]
# delete those rows and colums
x <- deleteEmptyCells(x)
image(x)
identical(x, xx)
Masking of Sequence Alignments with ALISCORE
Description
Provides a interface to Aliscore, in order to remove problematic regions of a DNA sequence alignment.
Usage
aliscore(x, gaps = "5state", w = 6, r, t, l, s, o, exec)
Arguments
x |
DNA sequences of class |
gaps |
A vector of mode |
w |
An integer giving the size of the sliding window. |
r |
An integer giving the number of random pairwise sequence
comparisons; defaults to |
t |
Not yet implemented. |
l |
Not yet implemented. |
s |
Not yet implemented. |
o |
A vector of mode |
exec |
A character string, giving the path to the Aliscore script. |
Value
A matrix
of class "DNAbin"
.
Note
This function was developed with ALISCORE version 2.
References
Misof, B. and K. Misof. 2009. A Monte Carlo approach successfully identifies randomness in multiple sequence alignments: a more objective means of data exclusion. Syst. Biol. 58: 21–34.
Kueck, P., K. Meusemann, J. Dambach, B. Thormann, B.M. von Reumont, J.W. Waegele and B. Misof. 2010. Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees. Frontiers in Zoology 7: 10.
Aliscore website: https://bonn.leibniz-lib.de/en/research/research-centres-and-groups/aliscore
See Also
mafft
and prank
for multiple sequence
alignment; gblocks
for another alignment masking algorithm.
Examples
data(ips.28S)
## Not run: aliscore(ips.28S)
XML Parameter Nodes
Description
A collection of parameter nodes used in the standard template of BEAST2.
Usage
beastLog(l, x, i)
Arguments
l |
A character string giving the name (i.e. |
x |
A list, internally generated by |
i |
Integer giving the index to a partition, internally generated by |
Value
An object of class XMLNode
.
Nucleotide-Nucleotide BLAST
Description
Provides an interface to BLASTN
Usage
blastn(query, db)
Arguments
query |
An object of class |
db |
An object of class |
Simple Gap/Indel Coding
Description
code.simple.gaps
takes an aligned DNA sequence matrix and codes the simple gaps, i.e. gaps that do not overlap with other gaps. The gapped positions are excluded from the matrix and the coded gap characters are appended to the matrix.
Usage
code.simple.gaps(x, append = TRUE)
Arguments
x |
An object of class |
append |
Logical. |
Value
An object of class DNAbin.
Author(s)
Christoph Heibl
References
Simmons, M.P. & H. Ochoterena. 2000. Gaps as characters in sequence-based phylogenetic analyses. Systematic Biology 49(2): 369–381.
See Also
deleteGaps
, deleteEmptyCells
, trimEnds
Collapse Unsupported Edges/Branches in a Phylogeny
Description
Given a set of node support values (e.g., bootstrap proportions, posterior probabilities) and a certain threshold, all edges receiving less support than the threshold will be collapsed.
Usage
collapseUnsupportedEdges(phy, value = "node.label", cutoff)
Arguments
phy |
An object of class |
value |
A character string giving the name of the list element
that contains the support values; default is |
cutoff |
A numeric value giving the threshold below which edges will be collapsed. |
Value
An object of class phylo
.
Examples
## phylogeny of bark beetles
data(ips.tree)
## non-parametric bootstrap proportions (BP)
ips.tree$node.label
## collapse clades with < 70 BP
tr <- collapseUnsupportedEdges(ips.tree, "node.label", 70)
## show new topology
par(mfrow = c(1, 2))
plot(ips.tree, no.margin = TRUE)
nodelabels(ips.tree$node.label, cex = .5, frame = "n", adj = c(0, .5))
plot(tr, no.margin = TRUE)
Graft Polytomies on Tips of Phylogeny
Description
Graft polytomies on the tips of a class phylo
object.
Usage
combMyTree(phy, data, brlen = 0, annotate = FALSE)
Arguments
phy |
An object of class |
data |
A data frame containing two columns. The entries of one column must be identical to the tip labels of the phylogeny; the other column contains the new tip labels. The column are matched automatically. |
brlen |
A numeric giving the branch lengths for the polytomies. |
annotate |
Logical, if |
Value
An object of class phylo
with nrow(data)
tips.
See Also
Examples
data(ips.tree)
## Simulate varying number of intraspecific observations
s <- sapply(1:Ntip(ips.tree), sample, x = 1:3, size = 1)
x <- rep(ips.tree$tip.label, times = s)
x <- data.frame(x, paste0(x, unlist(lapply(s, function(z) 1:z))))
## Create polytomies
tre <- combMyTree(ips.tree, x)
plot(tre, no.margin = TRUE, cex =.5)
Delete Missing Data from DNA Sequences
Description
Remove gaps ("-") and/or missing and ambiguous data ("N", "?") from a sample of DNA sequences.
Usage
del.miss(x)
Arguments
x |
A matrix, a list, or a vector of class |
Value
A list or a vector of class DNAbin
.
Remove Gap Positions From DNA Sequences
Description
Remove indel positions (or gaps) from a DNA sequence
alignment. For faster execution, deleteGaps
handles sequences
in ape's bit-level coding scheme.
Usage
deleteGaps(x, gap.max = nrow(x) - 4)
Arguments
x |
An object of class |
gap.max |
An integer, which gives the maximum number of
gap characters ("-") that will be tolerated at any given alignment
position (column). Only values between |
Details
The default, nmax = nrow(x) - 4
, removes all those
positions from the alignment, which contain at least four non-gap
characters, which is the minimum number of sequences needed to
produce a non-trivial unrooted topology. All gaps will be excluded
by selecting nmax = 0
and half of all gaps with nmax =
nrow(x) / 2
.
In contrast, del.gaps
removes all gap characters
from the alignment, so most probably the result will not be a set of
sequences of equal length and the matrix will be coerced to a list.
Value
An object of class DNAbin.
See Also
code.simple.gaps
for coding of simple gaps,
del.gaps
for removal of all gap symbols from an
alignment, gblocks
and aliscore
for more
sophisticated methods of cleaning/masking alignments.
Descendants of an Internal Node in a Phylogeny
Description
For any given internal node of a phylogeny, the function returns a vector containing the node numbers descending from that node.
Usage
descendants(phy, node, type = "t", ignore.tip = TRUE, labels = FALSE)
Arguments
phy |
an object of class |
node |
an integer giving the number of the internal node. |
type |
a character string, may be |
ignore.tip |
logical, if |
labels |
logical, determines if node labels are returned instead of node number, currently ignored unless |
Value
A vector containing terminal node numbers or tip labels.
Author(s)
Christoph Heibl
See Also
Examples
# generate a random tree with 12 terminal and 11 internal nodes:
tree <- rtree(12)
# get the descendants of internal node 15:
x <- descendants(tree, 15)
Identification of Stem-Lineage-Edges and MRCAs
Description
noi
(node of interest)
identifies the most recent common ancestor (MRCA) and eoi
(edge of interest) its subtending stem-lineage
edge of one or more sets of taxa/tips.
Usage
eoi(phy, node, group, regex = FALSE, stem = FALSE, monophyletic = FALSE)
noi(phy, group, regex = FALSE, stem = FALSE, monophyletic = FALSE)
Arguments
phy |
An object of class |
node |
A vector of mode |
group |
A vector or list of vectors of mode |
regex |
A logical, if |
stem |
Logical, ... |
monophyletic |
Logical, ... |
Value
A vector of mode "numeric"
containing node numbers.
See Also
mrca
; descendants
for the contrary
operation to noi
.
Examples
# molecular phylogeny of Ips bark beetles
# ---------------------------------------
data(ips.tree)
ips.tree <- ladderize(ips.tree)
ips.tree <- fixNodes(ips.tree)
clade1 <- descendants(ips.tree, 44, labels = TRUE)
mrca <- noi(ips.tree, clade1)
stem_lineage <- eoi(ips.tree, mrca)
ecol <- rep("black", Nedge(ips.tree))
ecol[stem_lineage] <- "red"
plot(ips.tree, no.margin = TRUE, edge.color = ecol)
nodelabels(node = mrca, pch = 22, col = "blue")
#gen <- sapply(viperidae$tip.label, function(x) unlist(strsplit(x, "_"))[1])
#tax <- data.frame(genus = gen, species = viperidae$tip.label, row.names = NULL)
# group can be a list
# -------------------
#myclades <- split(tax$species, tax$genus)
#nds <- noi(viperidae, myclades)
#plot(viperidae)
#nodeInfo(nds)
# group might contain tip numbers
# -------------------------------
#group <- list(c(17, 22), c(13, 1))
#plot(viperidae)
#append2tips(phy, tips = unlist(group), pch = 19)
#nds <- noi(viperidae, myclades)
#nodeInfo(nds)
# the 'group' argument can also take regular expressions
# ------------------------------------------------------
#rex <- "aspis"
#node <- noi(viperidae, rex, regex = TRUE)
#plot.phylo(viperidae, tip.color = 0, edge.color = 0)
#box.clades(viperidae, nodes = node, col = "#D2A6A7", align = "all")
#plot.phylo.upon(viperidae)
#nodelabels(node = node, pch = 21, cex = 1.2, col = "red", bg = "#D2A6A7")
# if the 'group' argument is a list of elements of length 2,
# n = length(group) nodes of interest will be returned
# ----------------------------------------------------
#group <- list(
# c("Vipera_berus", "Vipera_ursinii"),
# c("Vipera_aspis_ssp._aspis", "Vipera_latastei"),
# c("Vipera_ammodytes_ssp._ammodytes",
# "Vipera_ammodytes_ssp._montandoni"),
# c("Macrovipera_lebetina", "Vipera_wagneri")
#)
#clades <- noi(viperidae, group)
#plot.phylo(viperidae, tip.color = 0, edge.color = 0)
#box.clades(viperidae, nodes = clades, col = c("#FFFFA5", "#D2A6A7",
# "#A7D2A5", "#A5A6D2"), align = "all")
#plot.phylo.upon(viperidae)
Standard Node Numbering in Phylo Objects
Description
The function (re-)establishes the standard numbering of terminal and internal nodes in phylogenies represented as objects of class phylo
.
Usage
fixNodes(phy)
Arguments
phy |
An object of class |
Details
When reading phylogenetic trees from a NEXUS file that contains a translate
section, it can happen that the terminal nodes (tips, leaves) of the corresponding phylo
object are not numbered consecutively, which can be a problem in some downstream applications. You can use fixNodes
to get the correct order of terminal node numbers.
fixNodes
is also intended to re-establish the standard numbering of internal nodes and reorder all node value elements (e.g. node.label, posterior, ...) if a phylo
object has been modified by either root
, ladderize
, or rotate
.
Value
An object of class phylo
.
Note
fixNodes
has been completely rewritten for ips version 1.0-0. It should now run absolutely stable and is much quicker. Nevertheless, I recommend checking carefully the results of fixNodes
, until the function has been tested by a number of users. Then this comment will be removed.
Author(s)
Christoph Heibl
See Also
read.tree
,
read.nexus
, read.beast
for reading trees in NEWICK and NEXUS format; ladderize
and rotate
for tree manipulation; node.support
for plotting node support values has been moved to package viper.
Equal Tip Heights
Description
Modify terminal edge lengths to create "exactly" (see Details) equal tip heights (sum of edge lengths from root to tip)
Usage
forceEqualTipHeights(phy, baseline = "mean")
Arguments
phy |
An object of class |
baseline |
A character string giving a function to calculate the
baseline tip height, e.g. |
Details
What is "exactly" equal depends on the precision of the system
(.Machine
); in any case the resulting phylogeny will pass
is.ultrametric
with default arguments.
Value
An object of class phylo
with changed terminal
edge lengths.
Note
forceEqualTipHeights
is only intended to correct small rounding
errors in edge lengths, not to make an additive phylogeny ultrametric. For
the latter, see e.g. chronos
.
See Also
Masking of Sequence Alignments with GBLOCKS
Description
Provides a wrapper to Gblocks, a computer program written in ANSI C language that eliminates poorly aligned positions and divergent regions of an alignment of DNA or protein sequences. Gblocks selects conserved blocks from a multiple alignment according to a set of features of the alignment positions.
Usage
gblocks(
x,
b1 = 0.5,
b2 = b1,
b3 = ncol(x),
b4 = 2,
b5 = "a",
target = "alignment",
exec
)
Arguments
x |
A matrix of DNA sequences of classes |
b1 |
A real number, the minimum number of sequences for a conserved position given as a fraction. Values between 0.5 and 1.0 are allowed. Larger values will decrease the number of selected positions, i.e. are more conservative. Defaults to 0.5 |
b2 |
A real number, the minimum number of sequences for a flank
position given as a fraction. Values must be equal or larger than
|
b3 |
An integer, the maximum number of contiguous nonconserved positions; any integer is allowed. Larger values will increase the number of selected position, i.e. are less conservative. Defaults to the number of positions in the alignment. |
b4 |
An integer, the minimum length of a block, any integer equal to or bigger than 2 is allowed. Larger values will decrease the number of selected positions, i.e. are more conservative. Defaults to 2. |
b5 |
A character string indicating the treatment of gap
positions. Three choices are possible. 1. |
target |
A vector of mode |
exec |
A character string indicating the path to the GBLOCKS executable. |
Details
Explanation of the routine taken from the Online Documentation: First, the degree of conservation of every positions of the multiple alignment is evaluated and classified as nonconserved, conserved, or highly conserved. All stretches of contiguous nonconserved positions bigger than a certain value (b3) are rejected. In such stretches, alignments are normally ambiguous and, even when in some cases a unique alignment could be given, multiple hidden substitutions make them inadequate for phylogenetic analysis. In the remaining blocks, flanks are examined and positions are removed until blocks are surrounded by highly conserved positions at both flanks. This way, selected blocks are anchored by positions that can be aligned with high confidence. Then, all gap positions -that can be defined in three different ways (b5)- are removed. Furthermore, nonconserved positions adjacent to a gap position are also eliminated until a conserved position is reached, because regions adjacent to a gap are the most difficult to align. Finally, small blocks (falling below the limit of b4) remaining after gap cleaning are also removed.
Value
A matrix
of class "DNAbin"
Note
gblocks
was last updated and tested to work with Gblocks 0.91b.
If you have problems getting the function to work with a newer version of
Gblocks, please contact the package maintainer.
References
Castresana, J. 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution 17, 540-552.
Talavera, G., and J. Castresana. 2007. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Systematic Biology 56, 564-577.
Gblocks website: https://www.biologiaevolutiva.org/jcastresana/Gblocks.html
See Also
mafft
and prank
for multiple sequence
alignment; aliscore
for another alignment masking algorithm.
Examples
data(ips.28S)
## Not run: gblocks(ips.28S)
Conversion of Index to DNAbin
Description
Use indices of non-empty positions to convert a list of DNA sequences into a matrix.
Usage
index2DNAbin(DNAbin, index)
Arguments
DNAbin |
A list of class |
index |
A list of integers containing the indices of base positions. |
See Also
Internal IPS Functions
Description
Internal ips functions.
Note
These are internal functions in ips and are not intended to be called by the user.
Bark Beetle 16S Sequences
Description
This DNA alignment contains 376 positions of 42 sequences of 16S ribosomal DNA of the bark beetle genera Ips, Orthotomicus, and Pityogenes (Scolytinae, Curculionidae, Coleoptera).
Usage
data(ips.16S)
Format
The sequences are stored in binary format (see DNAbin
).
Source
The sequences were downloaded and assembled from the Nucleotide repository at GenBank on February 8, 2014.
References
The nucleotide database on the NCBI website: https://www.ncbi.nlm.nih.gov/nuccore
Examples
data(ips.16S)
Bark Beetle 28S Sequences
Description
This DNA alignment contains 562 positions of 28 sequences of 28S ribosomal DNA of the bark beetle genus Ips (Scolytinae, Curculionidae, Coleoptera).
Usage
data(ips.28S)
Format
The sequences are stored in binary format (see DNAbin
).
Source
The sequences were downloaded and assembled from the Nucleotide repository at GenBank on February 8, 2014.
References
The nucleotide database on the NCBI website: https://www.ncbi.nlm.nih.gov/nuccore
Examples
data(ips.28S)
Bark Beetle COX1 Sequences
Description
This DNA alignment contains 770 positions of 26 sequences of cox1 of the bark beetle genera Ips, Orthotomicus, and Pityogenes (Scolytinae, Curculionidae, Coleoptera).
Usage
data(ips.cox1)
Format
The sequences are stored in binary format (see DNAbin
).
Source
The sequences were downloaded and assembled from the Nucleotide repository at GenBank on February 8, 2014.
References
The nucleotide database on the NCBI website: https://www.ncbi.nlm.nih.gov/nuccore
Examples
data(ips.cox1)
Ips Phylogeny
Description
Phylogenetic tree of bark beetles (genus Ips).
Usage
data(ips.tree)
Format
The format is: List of 5 $ edge : int [1:72, 1:2] 38 39 39 40 41 42 42 43 44 45 ... $ Nnode : int 36 $ tip.label : chr [1:37] "Ips_acuminatus" "Ips_duplicatus" "Ips_integer" "Ips_plastographus" ... $ edge.length: num [1:72] 0.2806 0.0727 0.0295 0.0097 0.021 ... $ node.label : chr [1:36] "" "100" "21" "12" ... - attr(*, "class")= chr "phylo" - attr(*, "order")= chr "cladewise"
Examples
data(ips.tree)
plot(ips.tree)
Sequence Alignment with MAFFT
Description
This function is a wrapper for MAFFT and can be used for (profile) aligning of DNA and amino acid sequences.
Usage
mafft(
x,
y,
add,
method = "auto",
maxiterate = 0,
op = 1.53,
ep = 0,
gt = NULL,
options,
thread = -1,
exec,
quiet,
file
)
Arguments
x |
An object of class |
y |
An object of class |
add |
A character string giving the method used for adding |
method |
A character string giving the alignment method. Available
accuracy-oriented methods for less than 200 sequences are
|
maxiterate |
An integer giving the number of cycles of iterative
refinement to perform. Possible choices are |
op |
A numeric giving the |
ep |
A numeric giving the offset value, which works like |
gt |
An object of class |
options |
A vector of mode character specifying additional arguments to
MAFFT, that are not included in |
thread |
Integer giving the number of physical cores MAFFT should use;
with |
exec |
A character string giving the path to the MAFFT executable
including its name, e.g. something like |
quiet |
Logical, if set to |
file |
A character string indicating the filename of the output FASTA
file; if this is missing the the alignment will be returned as matrix of
class |
Details
"localpair"
selects the L-INS-i algorithm, probably
most accurate; recommended for <200 sequences; iterative refinement method
incorporating local pairwise alignment information.
"globalpair"
selects the G-INS-i algorithm suitable for
sequences of similar lengths; recommended for <200 sequences; iterative
refinement method incorporating global pairwise alignment information.
"genafpair"
selects the E-INS-i algorithm suitable for
sequences containing large unalignable regions; recommended for <200
sequences.
"retree 1"
selects the FFT-NS-1 algorithm, the simplest
progressive option in MAFFT; recommended for >200 sequences.
"retree 2"
selects the FFT-NS-2 algorithm that uses a second
iteration of alignment based on a guide tree computed from an FFT-NS-1
alignment; this is the default in MAFFT; recommended for >200 sequences.
Value
A matrix
of class "DNAbin"
or "AAbin"
.
Note
mafft
was last updated and tested to work with MAFFT 7.205. If
you have problems getting the function to work with a newer version of
MAFFT, please contact the package maintainer.
References
Katoh, K. and H. Toh. 2008. Recent developments in the MAFFT multiple sequence alignment program. Briefings in Bioinformatics 9: 286-298.
Katoh, K., K.-i. Kuma, H. Toh, and T. Miyata. 2005. Mafft version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research 33: 511–518.
Katoh, K., K. Misawa, K.-i. Kuma, and T. Miyata. 2002. Mafft: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleid Acids Research 30: 3059–3066.
https://mafft.cbrc.jp/alignment/software/index.html
See Also
read.fas
to import DNA sequences; prank
for another alignment algorithm; gblocks
and
aliscore
for alignment cleaning.
Profile Alignment with MAFFT
Description
Merge two or more DNA or amino acid sequence alignments by profile alignment with MAFFT.
Usage
mafft.merge(subMSA, method = "auto", gt, thread = -1, exec, quiet = TRUE)
Arguments
subMSA |
A list of objects of class |
method |
A character string giving the alignment method. Available
accuracy-oriented methods for less than 200 sequences are
|
gt |
An object of class |
thread |
Integer giving the number of physical cores MAFFT should use;
with |
exec |
A character string giving the path to the MAFFT executable
including its name, e.g. something like |
quiet |
Logical, if set to |
Value
An object of class "DNAbin"
or "AAbin"
.
Bayesian MCMC Tree Search with MrBayes
Description
Provides a wrapper for Bayesian phylogenetic tree search through MrBayes (Huelsenbeck & Ronquist, 2001; Ronquist & Huelsenbeck, 2003).
Usage
mrbayes(
x,
file = "",
lset,
prset,
mcmc,
unlink,
constraint,
burnin = 10,
contype = "allcompat",
exec,
run = FALSE
)
Arguments
x |
An object of class |
file |
A character string, giving the name of the MrBayes input file. |
lset |
A list as returned by |
prset |
A list as returned by |
mcmc |
A list as returned by |
unlink |
xxx |
constraint |
xxx |
burnin |
An integer; the number of samples from the MCMC to be discarded prior to further analysis. |
contype |
A character string; the type of consensus tree calculated from
the posterior distribution of trees: either |
exec |
A character string giving the full path of the MrBayes program. |
run |
Logical; |
Details
mrbayes
was last updated and tested with MrBayes
v3.2.2 under R 3.1.0 on a x86_64-apple-darwin10.8.0 (64-bit)
platform. It is intended to offer a simply parameterized building block for
larger scripts.
Value
None; a NEXUS file with MrBayes block is written to a file and, if
run = TRUE
, the MCMC runs in MrBayes are started.
References
J. P. Huelsenbeck & Ronquist F. 2001. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754-755. Ronquist F. & J. P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Biometrics 19: 1572-1574. MrBayes website: https://mrbayes.sourceforge.net/.
See Also
mafft
and prank
for sequence alignment;
raxml
for maximum likelihood tree search.
Examples
data(ips.cox1)
x <- ips.cox1[, 100:140] # tiny alignment
mrbayes(x, file = "", mcmc = mrbayes.mcmc(ngen = 100), run = FALSE)
## Not run:
library(phangorn)
tree <- rtree(10)
Y1 <- simSeq(tree, l = 20)
Y2 <- simSeq(tree, l = 20, type = "USER", levels=c("0", "1"))
Y <- cbind(as.character(Y1), as.character(Y2))
## End(Not run)
Model Settings for MrBayes
Description
Set model parameters for mrbayes
.
Usage
mrbayes.lset(..., partition)
Arguments
... |
arguments in |
partition |
a character string giving the labelling for a partion. |
Value
a list containing a subset (including the empty and the full set) of model parameters.
Model Parameters
- nucmodel
-
"4by4"
,"doublet"
,"codon"
, or"protein"
. - nst
-
1
,2
,6
, or"mixed"
. - code
-
"universal"
,"vertmt"
,"mycoplasma"
,"yeast"
,"ciliates"
, or"metmt"
. - ploidy
-
"haploid"
,"diploid"
, or"zlinked"
. - rates
-
"equal"
,"gamma"
,"propinv"
,"invgamma"
, or"adgamma"
. - ngammacat
-
1-24
- nbetacat
-
1-24
- omegavar
-
"equal"
,"ny98"
, or"m3"
. - covarion
-
"no"
or"yes"
. - coding
-
"all"
,"variable"
,"noabsencesites"
, or"nopresencesites"
. - parsmodel
-
"no"
or"yes"
.
Author(s)
Christoph Heibl
References
J.P. Huelsenbeck & Ronquist F. 2001. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754-755.
Ronquist F. & J.P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Biometrics 19: 1572-1574.
MrBayes website: https://mrbayes.sourceforge.net/.
See Also
mrbayes.prset
to set prior distributions, mrbayes.mcmc
to set parameters of the Markov chain Monte Carlo (MCMC), and mrbayes
to run MrBayes locally or prepare input files for a computer cluster.
Examples
## F81
mrbayes.lset(nst = 2)
## GTR + Gamma
mrbayes.lset(nst = 6, rates = "gamma")
## GTR + Gamma + I
mrbayes.lset(nst = 6, rates = "invgamma")
MCMC Settings for MrBayes
Description
Set Markov chain Monte Carlo (MCMC) parameters for mrbayes
.
Usage
mrbayes.mcmc(...)
Arguments
... |
arguments in |
Value
a list containing a subset (including the empty and the full set) of model parameters.
MCMC Parameters
- ngen
-
"NUMERIC"
- nruns
-
"NUMERIC"
- nchains
-
"NUMERIC"
- temp
-
"NUMERIC"
- swapfreq
-
"NUMERIC"
- nswaps
-
"NUMERIC"
- samplefreq
-
"NUMERIC"
- printfreq
-
"NUMERIC"
- printall
-
"yes"
or"no"
- printmax
-
"NUMERIC"
- mcmcdiagn
-
"yes"
or"no"
- diagnfreq
-
"NUMERIC"
- diagnstat
-
"avgstddev"
or"maxstddev"
- minpartfreq
-
"NUMERIC"
- allchains
-
"yes"
or"no"
- allcomps
-
"yes"
or"no"
- relburnin
-
"yes"
or"no"
- burnin
-
"NUMERIC"
- burninfrac
-
"NUMERIC"
- stoprule
-
"yes"
or"no"
- stopval
-
"NUMERIC"
- savetrees
-
"yes"
or"no"
- checkpoint
-
"yes"
or"no"
- checkfreq
-
"NUMERIC"
- startparams
-
"current"
or"reset"
- starttree
-
"current"
,"random"
, or"parsimony"
- nperts
-
"NUMERIC"
- data
-
"yes"
or"no"
- ordertaxa
-
"yes"
or"no"
- append
-
"yes"
or"no"
- autotune
-
"yes"
or"no"
- tunefreq
-
"NUMERIC"
Note
The parameters reweight and filename cannot be set via mrbayes.mcmc
.
Author(s)
Christoph Heibl
References
J.P. Huelsenbeck & Ronquist F. 2001. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754-755.
Ronquist F. & J.P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Biometrics 19: 1572-1574.
MrBayes website: https://mrbayes.sourceforge.net/.
See Also
mrbayes.lset
to set model parameters, mrbayes.prset
to set prior distributions, and mrbayes
to run MrBayes locally or prepare input files for a computer cluster.
Examples
mrbayes.mcmc()
Set Priors for MrBayes
Description
Set prior distributions for mrbayes
.
Usage
mrbayes.prset(...)
Arguments
... |
arguments in |
Value
a list of length zero (see 'Note')
Prior Distribution Parameters
- traitiopr
- revmatpr
- aamodelpr
- aarevmatpr
- omegapr
- ny98omega1pr
- ny98omega3pr
- m3omegapr
- codoncatfreqs
- statefreqpr
- shapepr
- ratecorrpr
- pinvarpr
- covswitchpr
- symdirihyperpr
- topologypr
- brlenspr
- clockvarpr
- igrvarpr
Note
This function currently returns an empty set of prior distribution parameters, i.e., you cannot change the MrBayes default parameters.
Author(s)
Christoph Heibl
References
J.P. Huelsenbeck & Ronquist F. 2001. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754-755.
Ronquist F. & J.P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Biometrics 19: 1572-1574.
MrBayes website: https://mrbayes.sourceforge.net/.
See Also
mrbayes.lset
to set model parameters, mrbayes.mcmc
to set parameters of the Markov chain Monte Carlo (MCMC), and mrbayes
to run MrBayes locally or prepare input files for a computer cluster.
Examples
mrbayes.prset()
MULTISTATE
Description
These functions provide wrappers to BayesMultiState
in the BayesTraits package written by Mark Pagel and Andrew Meade.
Usage
multistateML(phy, traits, model = "ARD", anc.states = TRUE,
path = "/Applications/BayesTraits", dir = NULL)
multistateMCMC(phy, traits, model = "ARD", anc.states = TRUE,
rd = 2, rjhp = NULL, fixNodes = NULL, it = 1e+05, bi = 10000,
sa = 1000, path = "/Applications/BayesTraits", dir = NULL)
Arguments
phy |
an object of class |
traits |
a |
model |
a character string to select a model of rate evolution. One of |
anc.states |
either |
rd |
a real number, giving the |
rjhp |
a character string giving the details of priors and hyperpriors for the reversible jump MCMC (rjMCMC). If left |
fixNodes |
a list giving fixed character states of certain internal nodes. This argument corresponds to the |
it |
numeric, sets the number of iterations to run the MCMC for. |
bi |
numeric, sets the number of iterations of the MCMC that will be discarded as burn-in. |
sa |
numeric, sets the the sample period in the MCMC. |
path |
a character string giving the path to executables in the BayesTraits package. |
dir |
a character string giving a directory name where the input and output files will be stored. The directory will be created by |
Author(s)
Christoph Heibl
References
The BayesTraits manual: http://www.evolution.reading.ac.uk/BayesTraitsV4.1.1/Files/BayesTraitsV4.1.1-Manual.pdf.
Pagel, M., A. Meade, and D. Barker. 2004. Bayesian estimation of ancestral character states on phylogenies. Syst. Biol. 53: 673-684.
Pagel, M. and A. Meade. 2006. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. Am. Nat. 167: 808-825.
See Also
ace
Neighboring Nodes in a Minimum Spanning Tree
Description
Finds all pairs of adjacent nodes, i.e. nodes separated by only one edge, in a minimum spanning tree
Usage
neighboringPairs(mst)
Arguments
mst |
An object of class |
Numbers of Tips of (Sub)trees
Description
Counts the number of tips of a given clade of a phylogenetic tree.
Usage
ntip(phy, node)
Arguments
phy |
An object of class |
node |
An integer given the number of an internal node. |
Value
An integer giving the number of tips.
Examples
set.seed(1234)
tr <- rtree(12)
plot(tr); nodelabels()
ntip(tr, 16)
PartitionFinder
Description
Provides a wrapper to the PartitionFinder software.
Usage
partitionfinder(
alignment,
user.tree,
branchlengths = "linked",
models = "all",
model.selection = "BIC",
search = "greedy",
exec = "/Applications/PartitionFinderV1.1.1_Mac/PartitionFinder.py"
)
Arguments
alignment |
A |
user.tree |
A |
branchlengths |
A |
models |
A |
model.selection |
A |
search |
A |
exec |
A character string giving the path to the executable (python script). |
References
Lanfear, R., B. Calcott, S.Y.W. Ho, and S. Guindon. 2012. PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology and Evolution 29: 1695-1701.
Lanfear, R., B. Calcott, K. David, C. Mayer, and A. Stamatakis. 2014. Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evolutionary Biology 14: 82.
PATHd8
Description
This function is a wrapper for PATHd8 and can be used for phylogenetic dating, especially of large trees
Usage
pathd8(phy, exec = "/Applications/PATHd8/PATHd8", seql, calibration)
Arguments
phy |
An object of class |
exec |
A character string giving the path to the PATHd8 program. |
seql |
sequence length of alignment |
calibration |
A data frame with 4 columns and as many rows as calibration points. Columns are: taxon 1; taxon 2; one of c("minage", "maxage", "fixage"); age. |
Value
tree list of ultrametric trees returned from PATHd8 of class
phylo
. First tree is PATHd8 chronogram, which is a calibrated
ultrametric tree. Second is a PATH tree, which is a ultrametric tree
without calibration.
Author(s)
Franz-Sebastian Krah
References
Britton et al (2006). PATHd8—a new method for estimating divergence times in large phylogenetic trees without a molecular clock. Available from the authors (www.math.su.se/PATHd8)
Britton et al. (2007). Estimating divergence times in large phylogenetic trees. Systematic biology. 56:741–752
Examples
## Not run:
## This example is taken from the PATHD8 manual
cal <- rbind(cal1 = c("Rat", "Ostrich", "minage", 260),
cal2 = c("Human", "Platypus", "fixage", 125),
cal3 = c("Alligator", "Ostrich", "minage", 150))
colnames(cal) = c("tax1", "tax2", "age_type", "age")
phy <- read.tree(text = paste0("((((Rat:0.007148,Human:0.001808):0.024345,",
"Platypus:0.016588):0.012920,(Ostrich:0.018119,",
"Alligator:0. 006232):0.004708):0.028037,Frog:0);")
seql <- 1823
pathd8(phy, exec = "/Applications/PATHd8/PATHd8", seql = seql, calibration = cal)
## End(Not run)
Convert Trees for MAFFT
Description
Converts a phylogenetic tree of class "phylo"
to a format
usable as a guide tree by MAFFT. This function is called internally by
mafft
.
Usage
phylo2mafft(phy, file)
Arguments
phy |
A phylogenetic tree of class |
file |
A character string giving a filename. May be missing, in which case the results are only printed on the screen. |
Value
A matrix coding the MAFFT-formatted tree, as a side effect the same
matrix is written to file
.
References
The MAFFT website: https://mafft.cbrc.jp/alignment/software/index.html
See Also
mafft
for an interface to MAFFT.
Conversion from PHYLO to MST Object
Description
Converts a phylogenetic tree (class phylo
) into a
minimum spanning tree (class mst
).
Usage
phylo2mst(phy)
Arguments
phy |
An object of class |
Details
The current version of phylo2mst
does not handle polytomies
and does not incorporate branch length information. Note that topological
information is lost during the conversion.
Examples
phy <- rtree(12)
plot(phy)
mst <- phylo2mst(phy)
plot(mst)
Number of Potentially-Informative Sites
Description
Returns the number or positions of potentially-informative (parsimony-informative, phylogenetically-informative) sites in DNA sequence alignment.
Usage
pis(x, what = "fraction", use.ambiguities = FALSE)
Arguments
x |
An object of class |
what |
Either of |
use.ambiguities |
Not yet available. |
Value
Numeric (depending on what
, the number, fraction, or indices
of potentially-informative nucleotide sites).
Examples
data(ips.16S)
# number of potentially-informative sites:
pis(ips.16S, what = "abs")
# proportion of potentially-informative sites:
pis(ips.16S, what = "frac")
# indices of potentially-informative sites:
pis(ips.16S, what = "ind")
PRANK
Description
DNA sequence Alignment Using the program PRANK.
Usage
prank(x, outfile, guidetree = NULL, gaprate = 0.025,
gapext = 0.75, path)
Arguments
x |
an object of class |
outfile |
a character string giving a name for the output file. |
guidetree |
an object of class |
gaprate |
numeric giving the gap opening rate; defaults to 0.025. |
gapext |
numeric giving the gap extension penalty; defaults to 0.75. |
path |
a character string indicating the path to the PRANK executable. |
Value
matrix
of class "DNAbin"
Note
prank
was last updated and tested to work with PRANK v. 120814 on Windows XP. If you have problems getting the function to work with a newer version of PRANK, contact the package maintainer.
References
http://wasabiapp.org/software/prank/
See Also
read.fas
to import DNA sequences;
mafft
for another alignment algorithm;
gblocks
and aliscore
for alignment cleaning.
Maximum Likelihood Tree Estimation with RAxML
Description
Provides an interface to the C program RAxML (see Reference section) for maximum likelihood estimation of tree topology and/or branch lengths, rapid and conventional non-parametric bootstrapping, mapping splits onto individual topologies, and a lot more. See the RAxML manual for details, especially if you are a new user of RAxML.
Usage
raxml(
DNAbin,
m = "GTRCAT",
f,
N,
p,
b,
x,
k,
weights,
partitions,
outgroup,
backbone = NULL,
file = paste0("fromR_", Sys.Date()),
exec,
threads
)
Arguments
DNAbin |
A matrix of DNA sequences of class |
m |
A vector of mode |
f |
A vector of mode |
N |
Either of mode |
p |
Integer, setting a random seed for the parsimony starting trees. |
b |
Integer, setting a random seed for bootstrapping. |
x |
Integer, setting a random seed for rapid bootstrapping. |
k |
Logical, if |
weights |
A vector of mode |
partitions |
A data frame giving the partitions of the alignment. |
outgroup |
A vector of mode |
backbone |
A |
file |
A vector of mode |
exec |
A vector of mode |
threads |
Integer, giving the number of parallel threads to use (PTHREADS only). |
Details
There are some limitations of this wrapper compared to RAxML run directly from the command line.
Only DNA is allowed as data type.
Option
f
can only take a limited number of values (d
,a
).
RAxML needs the specification of random seeds for parsimony estimation of
starting trees and for bootstrap resampling. The corresponding argument
names in raxml
are identical to the flags used by RAxML (-p
,
-b
, and -x
). If you choose not to give any values,
raxml
will generate a (different) value for each required random
seed every time it is called. Be aware that set.seed
will work
only for p
, but not for b
or x
.
Value
A list with a variable number of elements, depending on the analysis chosen:
"info" | RAxML log file as character string |
"bestTree" | MLE of tree |
"bipartitions" | MLE of tree annotated with bootstrap proportions |
"bootstrap" | bootstrapped trees |
Note
RAxML is a C program and the source code is not contained in this
package. This means that in order to run this function you will need to
install RAxML yourself. See
https://cme.h-its.org/exelixis/web/software/raxml/ for the most recent
documentation and source code of RAxML. Depending on where you chose to
install RAxML, you need to adjust the exec
argument.
raxml
was last tested and running fine on Mac OS X with RAxML 8.0.29.
Please be aware that calling third-party software from within R is a
platform-specific process and I cannot guarantee that raxml
will
behave properly on any system.
References
(in chronological order)
Stamatakis, A., T. Ludwig and H. Meier. 2004. RAxML-III: A fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 1: 1–8.
Stamatakis, A. 2006. RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688–2690.
Stamatakis, A., P. Hoover, and J. Rougemont. 2008. A rapid bootstrap algorithm for the RAxML web-servers. Syst. Biol. 75: 758–771.
Pattengale, N. D., M. Alipour, O. R. P. Bininda-Emonds, B. M. E. Moret, and A. Stamatakis. 2010. How many bootstrap replicates are necessary? Journal of Computational Biology 17: 337-354.
Stamatakis, A. 2014. RAxML Version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics Advance Access.
See Also
raxml.partitions
to store partitioning information in a
data frame suitable for input as partitions
argument in raxml
.
Examples
## bark beetle sequences
data(ips.cox1)
data(ips.16S)
data(ips.28S)
ips <- cbind(ips.cox1, ips.16S, ips.28S,
fill.with.gaps = TRUE)
exec <- "/Applications/RAxML-code/standard-RAxML/raxmlHPC-PTHREADS-AVX"
w <- sample(1:5, ncol(ips.cox1), replace = TRUE)
## Not run:
# Simple tree search with GTRCAT and GTRGAMMA
tr <- raxml(ips.cox1, f = "d", N = 2, p = 1234,
exec = exec) # -1743.528461
tr <- raxml(ips.cox1, m = "GTRGAMMA", f = "d", N = 2, p = 1234,
exec = exec)
# Applying weights to columns
tr <- raxml(ips.cox1, f = "d", N = 2, p = 1234,
weights = w, exec = exec) # -1743.528461
# Rapid bootstrap
tr <- raxml(ips.cox1, m = "GTRGAMMA",
f = "a", N = 10, p = 1234, x = 1234,
exec = exec)
# Rapid bootstrap with automatic halt
tr <- raxml(ips.cox1, m = "GTRGAMMA",
f = "a", N = "autoMRE", p = 1234, x = 1234,
exec = exec)
## End(Not run)
Partition scheme for RAxML
Description
Given a set of DNA sequence alignments, raxml.partitions
creates a data frame with partition bounderies that can be input into raxml
.
Usage
raxml.partitions(...)
Arguments
... |
Two or more DNA sequence alignments of class |
Details
For raxml.partitions
to make sense, the DNA sequence alignments must be given exactly in the same order in which they are concatenated into a supermatrix (see Examples section). Without any testing, the type of sequences is supposed to be DNA.
Value
A data frame with four columns (type
, locus
, begin
, and end
) and number of rows corresponding to the number of partitions.
See Also
cbind.DNAbin
to concatenate multiple alignments; raxml
for an interface to RAxML.
Examples
## bark beetle sequences
data(ips.cox1)
data(ips.16S)
data(ips.28S)
## Note the same order of individual
## alignments in both functions:
## ----------------------------
raxml.partitions(cox1 = ips.cox1,
r16S = ips.16S,
r28S = ips.28S)
cbind(ips.cox1, ips.16S, ips.28S,
fill.with.gaps = TRUE)
XML Input Files for BEAST
Description
Prepare XML files for BEAST with R. BEAST uses an MCMC approach to estimate rooted phylogenies from molecular data (Drummond & Rambaut, 2007).
Usage
rbeauti(
...,
file,
template = "standard",
link.clocks = TRUE,
link.trees = TRUE,
subst.model,
clock,
tree,
taxonset,
chain.length = 1e+07,
log.every = 1000
)
Arguments
... |
One or more object(s) of class |
file |
A connection, or a character string naming the file to write to. If left empty the XML tree will be printed to the screen (see Examples). |
template |
Currently unused. |
link.clocks |
Logical, indicating if clock models should be linked over partitions. |
link.trees |
Logical, indicating if tree models should be linked over partitions. |
subst.model |
A character string defining a substituion model, either
|
clock |
A character string defining a clock model, either |
tree |
A character string defining a tree model. |
taxonset |
A list containing one or more taxon sets. |
chain.length |
Integer, the number of generations to run the MCMC. |
log.every |
Integer, defining how often samples from the posterior will be written to log files and shown on screen. |
Details
rbeauti
has been completely rewritten to work with
BEAST 2. Currently rbeauti
offers few options, because the
idea is not to create ready-to-use XML file. That can be done conveniently
with BEAUti (the BEAST package's genuine XML generator). Instead,
rbeauti
is intended to make the definition of large numbers of taxon
sets easy. The creation of taxon sets can be done via R scripts and the
resulting XML files can be further modified with BEAUti.
References
The XML reference at the BEAST 2 website: http://beast.community/xml_reference Drummond, A.J. & A. Rambaut. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology 7: 240.
See Also
Examples
data(ips.16S)
## define taxon sets
spec <- rownames(ips.16S)
ingroup <- spec[grep("Ips|Orthomotomicus", spec)]
outgroup <- spec[grep("Pityogenes", spec)]
ts <- list(list(id = "ingroup", taxon = ingroup),
list(id = "outgroup", taxon = outgroup))
## print XML file to screen
# rbeauti(ips.16S, taxonset = ts)
Reverse-Complement of DNA sequences
Description
Reverse, complement or reverse-complement of DNA sequences.
Usage
rc(seqs, i, complement = TRUE, reverse = TRUE, delete.gaps = FALSE)
Arguments
seqs |
An object of class |
i |
Logical or numeric index to indicate a subset of sequences to manipulate. If not given the entire set of sequences will be manipulated. |
complement |
Logical, indicating if sequences will be turned into their complement. |
reverse |
Logical, indication if sequences will be reversed. |
delete.gaps |
Logical, indicating if gap character will be removed prior to sequence manipulation. |
Value
An object of the same class and dimension as seqs
.
Examples
## A minimal sequence alignment:
x <- list(
seqA = c("a", "a", "c", "c", "g", "t"),
seqB = c("n", "-", "r", "y", "g", "t"))
x <- as.DNAbin(x)
## Three choices of manipulation:
as.character(x)
as.character(rc(x)) ## reverse-complement
as.character(rc(x, complement = FALSE)) ## only reverse
as.character(rc(x, reverse = FALSE)) ## only complement
## You can remove gaps:
as.character(rc(x, delete.gaps = TRUE)) ## gaps/indels removed
Reading Sequence Files
Description
Read DNA and amino acid sequences from FASTA, PHILIP, and NEXUS formatted files.
Usage
read.fas(x, text)
read.nex(x)
read.phy(x)
Arguments
x |
A character string, giving the file name. |
text |
A character string in FASTA format. |
Value
An matrix (aligned sequences) or list (unaligned sequences) of class
DNAbin
or AAbin
.
References
Maddison, D.R., D.L. Swofford, and W.P. Maddison. 1997. NEXUS: an extensible file format for systematic information. Syst. Biol. 46: 590-621.
See Also
mafft
and prank
for sequence alignment,
gblocks
and aliscore
for quality check and
cleaning of sequence alignments, cbind.DNAbin
for
concatenation of sequence alignments.
Examples
## bark beetle COX1 sequences
data(ips.cox1)
## create temporary file names
format <- c(".fas", ".phy", ".nex")
fn <- sapply(format, tempfile,
pattern = "ips", tmpdir = tempdir())
## write sequences files
write.fas(ips.cox1, fn[".fas"])
write.phy(ips.cox1, fn[".phy"])
write.nex(ips.cox1, fn[".nex"])
## read sequence files
fas <- read.fas(fn[".fas"])
phy <- read.phy(fn[".phy"])
nex <- read.nex(fn[".nex"])
## remove sequence files
unlink(fn)
Read Bayesian Trees
Description
Thess functions parse chronograms in NEXUS format as produced by TreeAnnotator or output by MrBayes.
Usage
read.mrbayes(file, digits = NULL)
read.beast(file, digits = NULL)
read.starbeast(file)
Arguments
file |
A character string giving the input file, which must be a TreeAnnotator-generated chronogram in NEXUS format. |
digits |
NULL or integer, if |
Value
An object of class phylo
.
Note
read.starbeast
currently parses only skalars and ranges; node statistics with more than two values will be deleted and a warning message will be issued. Future version of read.starbeast
will hopefully be able to append list or data frames to phylo
objects. If you have any opinion or wishes regarding the question of how this exactly should be managed, send me a message.
read.mrbayes
is intended to read single trees with annotated nodes. For reading tree samples from the posterior distribution there is a collection of perl script written by Johan Nylander (https://github.com/nylander/Burntrees).
Author(s)
Christoph Heibl
References
TreeAnnotator: http://beast.community/treeannotator
Metacomments in NEXUS: https://code.google.com/archive/p/beast-mcmc/wikis/NexusMetacommentFormat.wiki
See Also
read.beast.table
to extract internal node data from NEXUS file, rbeauti
to create XML input for BEAST. HPDbars
for plotting highest posterior densities on phylogenies has been moved to package viper.
Extract node data from BEAST chronogram
Description
This function reads a BEAST chronogram such as produced by TreeAnnotator and extracts time, rate, and support values for internal and external nodes. Nodes in the resulting data frame are ordered exactly like in the NEXUS file.
Usage
read.beast.table(file, digits = 2)
Arguments
file |
character string giving the input file, which must be a TreeAnnotator-generated chronogram in NEXUS format |
digits |
NULL or integer, if |
Value
A matrix; each row corresponds to an internal node, the (ape!)number of which is given in the first column; the remaining columns list the node values extracted from the chronogram.
Author(s)
Christoph Heibl
See Also
read.beast
to parse TreeAnnotator output, rbeauti
to create XML input for BEAST. HPDbars
for plotting highest posterior densities on phylogenies has been moved to package viper.
Identification of Sister Nodes and Clades
Description
For any given internal node in a phylogeny, this function returns the sister clade.
Usage
sister(phy, node, type = "terminal", label = FALSE)
Arguments
phy |
An object of class |
node |
A vector of mode |
type |
A character string, may be |
label |
Logical, determining if tip number or tip labels will be returned. |
Value
A vector of mode "numeric"
or "character"
, containing
either tip numbers or labels, respectively.
See Also
Examples
# A phylogeny of bark beetles ...
data(ips.tree)
tcol <- rep("black", Ntip(ips.tree))
tcol[ips.tree$tip.label %in% c("Ips_typographus", "Ips_nitidus")] <- "blue"
tcol[ips.tree$tip.label %in% c("Ips_duplicatus")] <- "red"
plot(ips.tree, no.margin = TRUE, tip.color = tcol)
# What is the sister species of Ips typographus?
sister(ips.tree, "Ips_typographus", label = TRUE)
# Return the MRCA of the sister clade of Ips duplicatus
x <- sister(ips.tree, "Ips_duplicatus", "daughter")
nodelabels(node = x, pch = 21, bg = "red")
Find Monophyletic Subsets in Species Lists
Description
Takes a phylogeny and a subset of its tiplabels and splits the list of tiplabels into monophyletic groups (clades).
Usage
splitIntoClades(phy, tips)
Arguments
phy |
An object of class |
tips |
A vector of mode |
Value
A list.
Find Pairs of Sister Species
Description
Finds pairs of sister species in a phylogenetic tree.
Usage
terminalSisters(phy, labels = TRUE)
Arguments
phy |
An object of class |
labels |
Logical, indicating whether to return tip labels or tip numbers. |
Value
A list of which each element contains the tip labels of a sister species pair.
Examples
set.seed(1234)
tr <- rtree(12)
plot(tr)
terminalSisters(tr)
Tip Heights in a Phylogenetic Tree
Description
For each tip (leave, terminal node) in the phylogenetic tree the edge lengths (branch lengths) from root to tip, be it units of time or divergence, is summed up.
Usage
tipHeights(phy)
Arguments
phy |
an object of class |
Value
a numeric vector with distances from root to tip for each tip in the phylogenetic tree.
Author(s)
Christoph Heibl
See Also
branching.times
Trait-Dependent Shifts in Molecular Rate
Description
Detection of trait-dependent shifts in the rate of molecular evolution with traitRate (Mayrose & Otto, 2011).
Usage
traitRate(phy, seq, x, mainType = "Optimize_Model",
n, charModelParam1 = 0.5, charModelParam2 = 1,
gammaParam = 0.5, seqModelParam1 = 2,
exec = "/Applications/traitRate-1.1/programs/traitRate")
Arguments
phy |
a ultrametric phylogenetic tree of class |
seq |
a multiple sequence alignment of class |
x |
data frame containing a binary character in the first column. |
mainType |
character string giving the type of analysis; two choices are possible: |
n |
numeric, the number of bootstrap replicates. Will be ignored if |
charModelParam1 |
numeric, giving an initial value for the rate of transitions of character state 0 to 1. |
charModelParam2 |
numeric, giving an initial value for the rate of transitions of character state 1 to 0. |
gammaParam |
numeric, giving an initial value for the alpha parameter of the model of sequence evolution. |
seqModelParam1 |
numeric, giving an initial value for the kappa parameter of the model of sequence evolution. |
exec |
character string giving the path to the program directory. |
Value
Currently none, but look for the output files in the 'RESULTS'
subdirectory in the current working directory.
Note
This function is under development!
Author(s)
Christoph Heibl
References
Mayrose, I. & S.P. Otto. 2011. A likelihood method for detecting trait-dependent shifts in the rate of molecular evolution. Mol. Biol. Evol. 28: 759-770
See Also
read.tree
for reading phylogenetic trees, read.fas
for reading multiple sequence alignments in FASTA format.
Trim Alignment Ends
Description
Trims both ends of a DNA sequence alignment to the first and
last alignment positions that contain a minimum number of IUPAC base
characters ("a"
, "c"
, "g"
, "t"
, "r"
,
"y"
, "s"
, "w"
, "k"
, "m"
, "b"
,
"d"
, "h"
, "v"
). In addition, all gap characters
("-"
) beyond the first and last base characters of each sequence are
replaced by the character "n"
.
Usage
trimEnds(x, min.n.seq = 4)
Arguments
x |
An object of class |
min.n.seq |
A |
Value
An object of class DNAbin
.
See Also
Examples
# simple example alignment:
x <- structure(list(nb = 5, seq = c("acaaggtaca", "-caaggtac-",
"acaaggtaca", "aca--gtaca", "-ccaggta--"), nam = LETTERS[1:5]),
.Names = c("nb", "seq", "nam"), class = "alignment")
# convert to DNAbin:
x <- as.DNAbin(x)
# fill missing nucleotides:
x <- trimEnds(x)
# show results:
as.character(x[2, ])
Unlist To First Level Only
Description
Does the same as unlist
, but recurses only
one level.
Usage
unlistFirstLevel(z, use.names = TRUE)
Arguments
z |
A list of lists. |
use.names |
Logical, indicating if element names from the element should be preserved. |
Write DNA Sequences to File
Description
Write DNA sequences and morphological data to FASTA, PHYLIP, or NEXUS formatted files.
Usage
write.fas(x, file, block.width = FALSE,
truncate = FALSE, append = FALSE)
write.phy(x, file, block.width = FALSE,
strict = FALSE)
write.nex(x, file, block.width = 60,
taxblock = FALSE)
Arguments
x |
an object of class |
file |
a character string giving the filename; a special case is |
block.width |
an integer, giving the number of characters per line. |
truncate |
truncation of taxon names to the number of characters given as a integer, otherwise (default) taxon names will not be changed. |
append |
logical, if |
strict |
logical, if |
taxblock |
logical, if |
Details
write.nex
can handle multiple DNA sequence alignments, which are handed over as a list of objects of class DNAbin
. Correct matching of the rows in the alignments is cared for automatically, hence the individual alignments can contain different numbers of samples and samples need not be in the same order.
Value
None, except when called with file
left unspecified, which causes the file content to be returned as a vector of mode "character"
. This is particularly useful for constructing special types of input files, e.g. for MrBayes (mrbayes
).
Author(s)
Christoph Heibl
References
Maddison, D.R., D.L. Swofford, and W.P. Maddison. 1997. NEXUS: an extensible file format for systematic information. Syst. Biol. 46: 590-621.
See Also
read.fas
, read.phy
, and read.nex
for reading of DNA sequence files.
Examples
data(ips.cox1)
data(ips.28S)
## Examples for FASTA files
## ------------------------
write.fas(ips.cox1[1:5, 1:120], block.width = 60)
## Examples for PHYLIP files
## ------------------------
write.phy(ips.cox1[1:5, 1:20], block.width = 40)
## Examples for NEXUS files
## ------------------------
x <- list(cox1 = ips.cox1[1:5, 1:10],
rna28S = ips.28S[1:5, 1:30])
write.nex(x, block.width = 20)
# Truncation of taxonnames:
# -------------------------
rownames(ips.cox1)[1] <- "AVeeeeeeeeeeeeeeeeeryLongName"
write.fas(ips.cox1, truncate = 10)
# If truncation leads to identical taxonnames,
# a warning will be issued:
# -------------------------
rownames(ips.cox1)[1:2] <- "AVeeeeeeeeeeeeeeeeeryLongName"
write.fas(ips.cox1, truncate = 10)