Type: | Package |
Title: | Scelestial: Steiner Tree Based Single-Cell Lineage Tree Inference |
Version: | 1.0.4 |
Date: | 2023-11-29 |
Maintainer: | Mohammad Hadi Foroughmand Araabi <foroughmand@gmail.com> |
Description: | Scelestial infers a lineage tree from single-cell DNA mutation matrix. It generates a tree with approximately maximum parsimony through a Steiner tree approximation algorithm. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Imports: | Rcpp (≥ 1.0.1) |
LinkingTo: | Rcpp |
RoxygenNote: | 7.2.3 |
Suggests: | igraph, knitr, rmarkdown, stringr, seqinr, spelling |
VignetteBuilder: | knitr, rmarkdown |
Encoding: | UTF-8 |
Language: | en-US |
NeedsCompilation: | yes |
Packaged: | 2023-11-30 18:58:29 UTC; hforoughmand |
Author: | Mohammad Hadi Foroughmand Araabi [aut, cre], Sama Goliaei [aut, ctb], Alice McHardy [ctb] |
Repository: | CRAN |
Date/Publication: | 2023-11-30 21:00:02 UTC |
Internal function for running scelestial algorithm.
Description
Internal function for running scelestial algorithm.
Usage
.scelestial(data, minK = 3L, maxK = 4L)
Arguments
data |
The data |
minK , maxK |
Minimum and maximum number of vertices to be considered for k-restricted Steiner tree. |
Value
The tree as well as missing value imputation
Internal function for generating synthetic single-cell data through simulation of tumor growth and evolution.
Description
Internal function for generating synthetic single-cell data through simulation of tumor growth and evolution.
Usage
.synthesis(
sample,
site,
evolutionSteps,
mutationRate = 0.01,
advantageIncreaseRatio = 1,
advantageDecreaseRatio = 10,
advantageKeepRatio = 100,
advantageIncreaseStep = 0.01,
advantageDecreaseStep = 0.01,
mvRate = 0.5,
fpRate = 0.2,
fnRate = 0.1,
seed = -1L
)
Arguments
sample |
Number of samples |
site |
Number of sites |
evolutionSteps |
Number of non-root nodes in the evolutionary tree to be generated. |
mutationRate |
The rate of mutation on each evolutionary step in evolutionary tree synthesis. |
advantageIncreaseRatio , advantageDecreaseRatio , advantageKeepRatio |
A child node
in the evolutionary tree is chosen for increase/decrease/keep its parent advantage with
probabilities proportional to |
advantageIncreaseStep , advantageDecreaseStep |
The amount of increasing or decreasing the advantage of a cell relative to its parent. |
mvRate |
Rate of missing value to be added to the resulting sequences. |
fpRate , fnRate |
Rate of false positive (0 -> 1) and false negative (1 -> 0) in the sequences. |
seed |
The seed for randomization. |
Value
The function returns a list. The list consists of
-
sequence
: A data frame representing result of sequencing. The data frame has a row for each locus and a column for each sample. -
true.sequence
: The actual sequence for the sample before adding errors and missing values. -
true.clone
: A list that stores index of sampled cells for each node in the evolutionary tree. -
true.tree
: The evolutionary tree that the samples are sampled from. It is a data frame withsrc
,dest
, andlen
columns representing source, destination and weight of edges of the tree, respectively.
Bladder invasive single cell tumor dataset
Description
Bladder invasive single cell tumor dataset
Usage
data(Li)
Format
Each column represent a cell and each row represent a locus. "./." represent the missing value, "A/A" the normal state and "C/C" the mutated state.
Source
References
Gigascience. 2012 Aug 14;1(1):12. doi: 10.1186/2047-217X-1-12. (PubMed)
Examples
data(Li)
RScelestial: An R wrapper for scelestial algorithm for single-cell lineage tree reconstruction through an approximation algorithm based on Steiner tree problem
Description
This package provides a wrapper for the scelestial which is implemented in C++.
The package contains function scelestial
for running the algorithm and
synthesis
for tumor simulation for providing synthetic data.
Conversion of ten-state sequencing matrix to 0/1-mutation matrix.
Description
Conversion of ten-state sequencing matrix to 0/1-mutation matrix.
Usage
as.mutation.matrix(seq)
Arguments
seq |
A dataframe representing the ten-state sequencing matrix. Elements of the matrix are the from "X/Y" for X and Y being nucleotides or "./." for missing value. Rows represent loci and columns represent samples. |
Value
A data frame with exactly the same size as the input seq
matrix.
The most abundant state in each loci (row) translated to 0, and
the others are translated to 1. Missing values are translated to 3.
Examples
## A small 10-state matrix
seq = data.frame("C1" = c("C/C", "C/C"), "C2" = c("A/A", NA), "C3" = c("C/C", "A/A"))
## Convert it to mutation matrix
as.mutation.matrix(seq)
# C1 C2 C3
# 1 0 1 0
# 2 1 3 0
Conversion of 0/1 matrix to 10-state matrix
Description
It converts 0 to A/A and 1 to C/C. 3 that represents missing values are converted to "./.".
Usage
as.ten.state.matrix(mut)
Arguments
mut |
A dataframe representing the mutation matrix. |
Value
A data frame with the exact size as mut
, in which
0, 1 and 3 (or NAs) are replaced with "A/A", "C/C", and "./.",
respectively.
Note
Note that following function does not provide inverse of as.mutation.matrix. It could be used to generate input for scelestial.
Examples
## A small 0/1/NA mutation matrix
mut = data.frame("C1" = c(0, 0), "C2" = c(0, 3), "C3" = c(1, 0))
## Convert it to 10-state matrix
as.ten.state.matrix(mut)
# C1 C2 C3
# 1 A/A A/A C/C
# 2 A/A ./. A/A
Generates 10-state sequence matrix from name/10-char string matrix.
Description
This function is used for conversion of results of internal scelestial result to 10-state sequence matrices.
Usage
as.ten.state.matrix.from.node.seq(n.seq)
Arguments
n.seq |
A two column data frame. First column is the name of a node and the second column is a string representation of the sequencing result. Each element of the sequencing result is from a 10-state representation in which each state represented as a character according to the following encoding:
|
Value
A 10-state sequence data frame with samples as columns
and loci as rows. Elements of n.seq
are translated
to their 10-state representations.
Examples
## A node sequence data frame
n.seq = data.frame("node" = c("C1", "C2"), "seq" = c("AKLTCXAAC", "AKKOCXAPC"))
## Convert it to ten state matrix
as.ten.state.matrix.from.node.seq(n.seq)
# V1 V2 V3 V4 V5 V6 V7 V8 V9
# C1 A/A A/C A/G T/T C/C ./. A/A A/A C/C
# C2 A/A A/C A/C T/G C/C ./. A/A T/A C/C
Calculates distance matrix for result of scelestial
Description
Calculates distance matrix for result of scelestial
Usage
distance.matrix.scelestial(SP, normalize = TRUE)
Arguments
SP |
Output of scelestial function |
normalize |
If true, sum of all elements of resulting table is added up to one. |
Value
The distance matrix
Examples
## Synthesise an evolution
S = synthesis(10, 5, 20, seed=7)
## Run Scelestial
SC = scelestial(as.ten.state.matrix(S$seqeunce))
## Calculate the distance matrix
distance.matrix.scelestial(SC)
# C1 C10 C2 C3 C4
# C1 0.000000000 0.003512891 0.015222451 0.014051472 0.008196692
# C10 0.003512891 0.000000000 0.011709560 0.010538580 0.004683800
# C2 0.015222451 0.011709560 0.000000000 0.010538627 0.007025759
# C3 0.014051472 0.010538580 0.010538627 0.000000000 0.005854780
# C4 0.008196692 0.004683800 0.007025759 0.005854780 0.000000000
# C5 0.011709560 0.008196668 0.003512891 0.007025736 0.003512868
# C6 0.023419213 0.019906322 0.019906368 0.009367741 0.015222521
# C7 0.018735342 0.015222451 0.015222498 0.004683871 0.010538651
# C8 0.015222474 0.011709583 0.014051542 0.012880562 0.007025783
# C9 0.010538627 0.007025736 0.009367695 0.008196715 0.002341935
# C5 C6 C7 C8 C9
# C1 0.011709560 0.023419213 0.018735342 0.015222474 0.010538627
# C10 0.008196668 0.019906322 0.015222451 0.011709583 0.007025736
# C2 0.003512891 0.019906368 0.015222498 0.014051542 0.009367695
# C3 0.007025736 0.009367741 0.004683871 0.012880562 0.008196715
# C4 0.003512868 0.015222521 0.010538651 0.007025783 0.002341935
# C5 0.000000000 0.016393477 0.011709606 0.010538651 0.005854803
# C6 0.016393477 0.000000000 0.004683871 0.022248304 0.017564457
# C7 0.011709606 0.004683871 0.000000000 0.017564433 0.012880586
# C8 0.010538651 0.022248304 0.017564433 0.000000000 0.004683847
# C9 0.005854803 0.017564457 0.012880586 0.004683847 0.000000000
Calculates distance matrix for a nodes on a tree.
Description
It is used for internal purposes.
Usage
distance.matrix.tree(graph, cell.names, tree.nodes, normalize = TRUE)
Arguments
graph |
The tree |
cell.names |
Name of the cells to be the row and column name of the resulting matrix |
tree.nodes |
For each cell.names a tree node is stored in tree.nodes. |
normalize |
If TRUE the resulting matrix is normalized. |
Value
A matrix with equal number of rows and columns, a row/column
for each cell. Elements of matrix represent distance between cells
on the graph
.
Examples
## Synthesise an evolution
S = synthesis(10, 5, 20, seed=7)
## Run Scelestial
SC = scelestial(as.ten.state.matrix(S$seqeunce))
## Calculate the distance matrix
vertices <- rownames(SC$input);
distance.matrix.tree(SC$tree, vertices, vertices, normalize = TRUE)
# C1 C10 C2 C3 C4
# C1 0.000000000 0.003512891 0.015222451 0.014051472 0.008196692
# C10 0.003512891 0.000000000 0.011709560 0.010538580 0.004683800
# C2 0.015222451 0.011709560 0.000000000 0.010538627 0.007025759
# C3 0.014051472 0.010538580 0.010538627 0.000000000 0.005854780
# C4 0.008196692 0.004683800 0.007025759 0.005854780 0.000000000
# C5 0.011709560 0.008196668 0.003512891 0.007025736 0.003512868
# C6 0.023419213 0.019906322 0.019906368 0.009367741 0.015222521
# C7 0.018735342 0.015222451 0.015222498 0.004683871 0.010538651
# C8 0.015222474 0.011709583 0.014051542 0.012880562 0.007025783
# C9 0.010538627 0.007025736 0.009367695 0.008196715 0.002341935
# C5 C6 C7 C8 C9
# C1 0.011709560 0.023419213 0.018735342 0.015222474 0.010538627
# C10 0.008196668 0.019906322 0.015222451 0.011709583 0.007025736
# C2 0.003512891 0.019906368 0.015222498 0.014051542 0.009367695
# C3 0.007025736 0.009367741 0.004683871 0.012880562 0.008196715
# C4 0.003512868 0.015222521 0.010538651 0.007025783 0.002341935
# C5 0.000000000 0.016393477 0.011709606 0.010538651 0.005854803
# C6 0.016393477 0.000000000 0.004683871 0.022248304 0.017564457
# C7 0.011709606 0.004683871 0.000000000 0.017564433 0.012880586
# C8 0.010538651 0.022248304 0.017564433 0.000000000 0.004683847
# C9 0.005854803 0.017564457 0.012880586 0.004683847 0.000000000
Calculates distance matrix for a synthetized data
Description
Calculates distance matrix for a synthetized data
Usage
distance.matrix.true.tree(D, normalize = TRUE)
Arguments
D |
Output of synthesis function |
normalize |
If true, sum of all elements of resulting table is added up to one. |
Value
The distance matrix of the true tree.
Examples
## Synthesise an evolution
S = synthesis(10, 5, 20, seed=7)
## Calculating the distance matrix of the true tree.
distance.matrix.true.tree(S)
# C3 C6 C4 C2 C7
# C3 0.000000000 0.004587156 0.006880734 0.009174312 0.013761468
# C6 0.004587156 0.000000000 0.002293578 0.009174312 0.013761468
# C4 0.006880734 0.002293578 0.000000000 0.011467890 0.016055046
# C2 0.009174312 0.009174312 0.011467890 0.000000000 0.004587156
# C7 0.013761468 0.013761468 0.016055046 0.004587156 0.000000000
# C10 0.006880734 0.006880734 0.009174312 0.011467890 0.016055046
# C8 0.006880734 0.011467890 0.013761468 0.016055046 0.020642202
# C9 0.006880734 0.011467890 0.013761468 0.016055046 0.020642202
# C1 0.011467890 0.011467890 0.013761468 0.002293578 0.006880734
# C5 0.011467890 0.011467890 0.013761468 0.002293578 0.006880734
# C10 C8 C9 C1 C5
# C3 0.006880734 0.006880734 0.006880734 0.011467890 0.011467890
# C6 0.006880734 0.011467890 0.011467890 0.011467890 0.011467890
# C4 0.009174312 0.013761468 0.013761468 0.013761468 0.013761468
# C2 0.011467890 0.016055046 0.016055046 0.002293578 0.002293578
# C7 0.016055046 0.020642202 0.020642202 0.006880734 0.006880734
# C10 0.000000000 0.013761468 0.013761468 0.013761468 0.013761468
# C8 0.013761468 0.000000000 0.000000000 0.018348624 0.018348624
# C9 0.013761468 0.000000000 0.000000000 0.018348624 0.018348624
# C1 0.013761468 0.018348624 0.018348624 0.000000000 0.000000000
# C5 0.013761468 0.018348624 0.018348624 0.000000000 0.000000000
Runs DFS on tree and calculates parent of each node as well as depth and upper-depth of nodes.
Description
It is used for internal purposes.
Usage
my.dfs(graph, root = NULL)
Arguments
graph |
The tree |
root |
The starting node of DFS. |
Value
a list with father
representing the parent node, and
balance.depth
representing the distance between the node and
the farthest node to it, as the elements.
Running depth first search on a tree and calling functions on entrance/exit events
Description
It is used for internal purposes.
Usage
my.general.dfs(
nei,
v,
f,
extra,
in.call,
mid.call.before,
mid.call.after,
out.call
)
Arguments
nei |
Neighbor list for each vertex |
v |
Starting node |
f |
Parent node |
extra |
the shared object for the whole DFS |
in.call |
First function to call |
mid.call.before |
Function to call before calling child DFS |
mid.call.after |
Function to call after calling child DFS |
out.call |
Last function to call |
Value
the extra
parameter modified with in.call
, mid.call.before
, mid.call.after
, and out.call
functions
Read mutation table
Description
A simple read of a sequencing file.
Usage
read.sequence.table(file.name)
Arguments
file.name |
Name of the file to be loaded |
Value
A table representing the content of the file. First column of the file represents the row names.
Examples
# An example input without header could be like following:
# 1 C/C A/A A/A A/A
# 2 ./. A/A C/C C/C
# 3 C/C A/A C/C ./.
# 4 A/A ./. ./. ./.
# 5 ./. A/A A/A A/A
#
# For this file you can run
read.sequence.table(system.file("extdata/sample1.txt", package="RScelestial"))
Infer the single-cell phylogenetic tree
Description
Performs the Scelestial algorithm and calculates the phylogenetic tree reconstruction based on an approximation algorithm for Steiner tree problem.
Usage
scelestial(
seq,
mink = 3,
maxk = 3,
root.assign.method = c("none", "balance", "fix"),
root = NULL,
return.graph = FALSE
)
Arguments
seq |
The sequence matrix. Rows represent loci and columns represent samples. Elements of the matrix represent 10-state genome sequencing results, or missing values. I.e each element is in the format "X/Y" where X and Y are from the set {A, T, C, G}. There is a special case "./." that represents the missing value. |
mink |
The minimum k used in the calculation of k-restricted Steiner trees. It is supposed to be 3. |
maxk |
The maximum k used in the calculation of k-restricted Steiner trees. When maxk=3, the approximation algorithm produces an 11/6-approximation result. Increasing k increases the running time as well as the approximation ratio of the algorithm. maxk should be not less than mink. |
root.assign.method , root |
root.assign.method is the method for choosing the root.
|
return.graph |
If TRUE, the actual graph through igraph library is generated and produced. |
Value
Returns a list containing following elements:
-
tree
: A data frame representing edges of the tree.tree$src
is the source of the edge,tree$dest
represents the destination of the edge, andtree$len
represents its weight (evolutionary distance). -
input
: input sequences. -
sequence
: inferred or imputed sequences for the tree nodes. If the node is already in the input, sequence represents its missing value imputation, in the case of presence of missing values, and if the node is not an input node, the sequence represents inferred sequence for the tree node. -
graph
: graph. If the return.graph is TRUE, there is an element G that represents the graph from the igraph library.
Examples
## simulates tumor evolution
S = synthesis(10, 10, 2, seed=7)
## convert to 10-state matrix
seq = as.ten.state.matrix(S$seqeunce)
## runs the scelestial to generate 4-restricted Steiner trees. It represents the tree and graph
SP = scelestial(seq, mink=3, maxk=4, return.graph = TRUE)
SP
## Expected output:
# $input
# node sequence
# 1 0 AAXACAAXXA
# 2 1 AXXXAXAAXA
# 3 2 AXAXCAXXAX
# 4 3 AXCCCAXAAX
# 5 4 AXCXAXXCAX
# 6 5 XXCAXXXXXX
# 7 6 XACXACAAAC
# 8 7 AXAXXAXAXA
# 9 8 AXAAXXAXXX
# 10 9 AAXXXXCXCX
#
# $sequence
# node sequence
# 1 0 AAAACAAACA
# 2 1 AACAAAAAAA
# 3 2 AAAACAAAAA
# 4 3 AACCCAAAAA
# 5 4 AACAACACAC
# 6 5 AACAACAAAC
# 7 6 AACAACAAAC
# 8 7 AAAACAAACA
# 9 8 AAAACAAACA
# 10 9 AAAACACACA
# 11 10 AAAACAAACA
# 12 16 AACAAAAAAA
# 13 18 AACACAAAAA
#
# $tree
# src dest len
# 1 9 10 4.00006
# 2 8 10 3.00006
# 3 7 10 2.50005
# 4 0 10 1.50003
# 5 6 16 3.00002
# 6 1 16 2.50005
# 7 3 18 2.50003
# 8 0 18 1.50003
# 9 16 18 1.00000
# 10 0 2 3.50008
# 11 4 6 4.00007
# 12 5 6 4.50010
#
# $graph
# IGRAPH 6ba60f3 DNW- 13 12 --
# + attr: name (v/c), weight (e/n)
# + edges from 6ba60f3 (vertex names):
# [1] 9 ->10 8 ->10 7 ->10 0 ->10 6 ->16 1 ->16 3 ->18 0 ->18 16->18 0 ->2
# [11] 4 ->6 5 ->6
#
Synthesize single-cell data through tumor simulation
Description
This function simulates a evolution in a tumor through two phases: 1) simulation of evolution, 2) sampling.
Usage
synthesis(
sample,
site,
evolution.step,
mutation.rate = 1,
advantage.increase.ratio = 1,
advantage.decrease.ratio = 10,
advantage.keep.ratio = 100,
advantage.increase.step = 0.01,
advantage.decrease.step = 0.01,
mv.rate = 0.5,
fp.rate = 0.2,
fn.rate = 0.1,
seed = -1
)
Arguments
sample |
Number of samples. |
site |
number of sites (loci) |
evolution.step |
Number of evolutionary steps in the process of production of the evolutionary tree. |
mutation.rate |
The rate of mutation on each evolutionary step in evolutionary tree synthesis. |
advantage.increase.ratio , advantage.decrease.ratio , advantage.keep.ratio |
A child node
in the evolutionary tree is chosen for increase/decrease/keep its parent advantage with
probabilities proportional to |
advantage.increase.step , advantage.decrease.step |
The amount of increasing or decreasing the advantage of a cell relative to its parent. |
mv.rate |
Rate of missing value to be added to the resulting sequences. |
fp.rate , fn.rate |
Rate of false positive (0 -> 1) and false negative (1 -> 0) in the sequences. |
seed |
The seed for randomization. |
Details
The simulation of evolution starts with a single cell.
Then for evolution.step
steps, on each step a cell is selected for duplication.
A new cell as its child is added to
the evolutionary tree. To each node in the evolutionary tree an advantage is assigned
representing its relative advantage in replication and in being sampled. Advantage of a node
is calculated by increasing (decreasing) its parents advantage by advantage.increase.step
(advantage.decrease.step
) with probability proportional to advantage.increase.ratio
(advantage.decrease.ratio
).
With a probability proportional to advantage.keep.ratio
the advantage of a node
is equal to its parent's advantage.
Sequences for each node is build based on its parent's sequence by adding some mutations.
Mutations are added for each locus independently with rate mutation.rate
.
In the sampling phase, sample
cells are selected from the evolutionary tree nodes.
Result of the sequencing process for a cell is determined by the sequence of the node in the evolutionary tree
with addition of some random errors. Errors are result of applying some false positives with rate fp.rate
,
applying some false negatives with rate fn.rate
, and adding some missing values
with rate mv.rate
.
Value
The function returns a list. The list consists of
-
sequence
: A data frame representing result of sequencing. The data frame has a row for each locus and a column for each sample. -
true.sequence
: The actual sequence for the sample before adding errors and missing values. -
true.clone
: A list that stores index of sampled cells for each node in the evolutionary tree. -
true.tree
: The evolutionary tree that the samples are sampled from. It is a data frame withsrc
,dest
, andlen
columns representing source, destination and weight of edges of the tree, respectively.
Examples
## generating a data set with 10 samples and 5 loci through simulation of
## 20-step evolution.
synthesis(10, 5, 20, seed=7)
## The result is
# $seqeunce
# C1 C2 C3 C4 C5
# L1 1 1 1 1 1
# L2 3 1 3 3 0
# L3 3 1 3 3 1
# L4 3 0 1 0 0
# L5 1 3 0 3 3
# L6 3 1 3 1 0
# L7 3 3 1 0 3
# L8 3 1 1 3 3
# L9 3 3 1 3 1
# L10 0 3 0 3 0
#
# $true.sequence
# C1 C2 C3 C4 C5
# L1 0 1 1 1 1
# L2 0 1 0 0 1
# L3 0 1 0 0 1
# L4 0 1 1 1 1
# L5 1 1 0 1 0
# L6 0 1 0 1 0
# L7 0 1 0 0 1
# L8 0 1 1 1 1
# L9 0 1 1 1 1
# L10 0 0 0 0 0
#
# $true.clone
# $true.clone[[1]]
# [1] 4
#
# $true.clone[[2]]
# [1] 1
#
# $true.clone[[3]]
# [1] 6
#
# $true.clone[[4]]
# [1] 10
#
# $true.clone[[5]]
# [1] 2
#
# $true.clone[[6]]
# [1] 3
#
# $true.clone[[7]]
# [1] 8 9
#
# $true.clone[[8]]
# [1] 7
#
# $true.clone[[9]]
# [1] 5
#
#
# $true.tree
# src dest len
# 1 1 5 3
# 2 5 7 1
# 3 5 10 2
# 4 1 11 3
# 5 1 12 2
# 6 1 13 3
# 7 7 14 2
# 8 12 19 1
# 9 10 20 1
#
Plotting the tree
Description
Plotting the igraph tree created by scelestial.
Usage
tree.plot(graph, ...)
Arguments
graph |
Output of scelestial or the G element of the scelestial output. |
... |
Parameters passing to the plot function |