Title: | A Bootstrap-Based Power Estimation Tool for Spatial Transcriptomics |
Version: | 0.1.0 |
Imports: | scam, Seurat, dplyr, plotly, resample, xgboost, magrittr, rayshader, ggplot2 |
Suggests: | patchwork, boot, knitr, rmarkdown, fields, rayrender, tidyr |
Description: | Power estimation and sample size calculation for 10X Visium Spatial Transcriptomics data to detect differential expressed genes between two conditions based on bootstrap resampling. See Shui et al. (2024) <doi:10.1101/2024.08.30.610564> for method details. |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
VignetteBuilder: | knitr |
License: | MIT + file LICENSE |
Depends: | R (≥ 2.10) |
LazyData: | true |
NeedsCompilation: | no |
Packaged: | 2024-09-04 13:17:57 UTC; shuilan |
Author: | Lan Shui |
Maintainer: | Lan Shui <Lan.Shui@uth.tmc.edu> |
Repository: | CRAN |
Date/Publication: | 2024-09-09 09:30:02 UTC |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Arguments
lhs |
A value or the magrittr placeholder. |
rhs |
A function call using the magrittr semantics. |
Value
The result of calling rhs(lhs)
.
Bootstrap resampling and power calculation upon ST data
Description
This function performs bootstrap resampling upon a Seurat subject under each condition to resemble the real dataset which allows the exact power calculation, and perform DE analysis. Users can specify the test they would like to perform for the DE analysis in '...' which should not contain min.pct and logfc.threshold or other parameters attempt to pre-filter genes, as we specify min.pct and logfc.threshold as 0s to calculate power for all the genes available. Therefore it may take one night to run if the ST data owns over thousands of genes. To speed up this process, one may want to try function 'PoweREST_subset' where the pre-filter of genes are included in this process.
Usage
PoweREST(Seurat_obj,cond,replicates=1,spots_num,
iteration=100,random_seed=1,pvalue=0.05,...)
Arguments
Seurat_obj |
A Seurat object. |
cond |
The name of the variable that indicates different conditions which is also stored in the meta.data of the Seurat_obj and should be in character type. |
replicates |
The number of sample replicates per group. |
spots_num |
The number of spots per replicate. |
iteration |
The number of iterations of the resampling. |
random_seed |
To set a random seed. |
pvalue |
The pvalue that will be considered significant. |
... |
DE test to use other than the default Wilcoxon test. |
Value
A list of values containing the power, average log2FC and percentage of spots detecting the gene among the resampling data, the replicate value and the spots number per slice specified by the user and corresponding genes' name.
Author(s)
Lan Shui lshui@mdanderson.org
Bootstrap resampling and power estimation for one single gene
Description
This function performs bootstrap resampling upon a Seurat subject under each condition to resemble the real dataset which allows the exact power calculation, and perform DE analysis upon one gene specified by the user. Users can specify the test they would like to perform for the DE analysis in '...'. Note that the results are not multiple testing corrected, therefore should be interpreted carefully.
Usage
PoweREST_gene(Seurat_obj,cond,replicates=1,spots_num,
gene_name,iteration=100,random_seed=1,pvalue=0.05,...)
Arguments
Seurat_obj |
A Seurat object. |
cond |
The name of the variable that indicates different conditions which is also stored in the meta.data of the Seurat_obj and should be in character type. |
replicates |
The number of sample replicates per group. |
spots_num |
The number of spots per replicate. |
gene_name |
Specify the name of gene for power calculation. |
iteration |
The number of iterations of the resampling. |
random_seed |
To set a random seed. |
pvalue |
The pvalue that will be considered significant. |
... |
DE Test to use other than the default Wilcoxon test. |
Value
A list of values containing the power, average log2FC and percentage of spots detecting the gene among the resampling data, the replicate value and the spots number per slice specified by the user and corresponding gene's name.
Author(s)
Lan Shui lshui@mdanderson.org
Bootstrap resampling and power calculation for a subset of genes
Description
This function performs bootstrap resampling upon a Seurat subject under each condition to resemble the real dataset which allows the exact power calculation, and perform DE analysis. Similar to 'PoweREST', users can specify the test they would like to perform for the DE analysis in '...' (more test options can be refered to Seurat. Different to 'PoweREST', users can specify the values of 'min.pct' and 'logfc.threshold' to pre-filter the genes based on their minimum detection rate 'min.pct' and at least X-fold difference (log-scale) ('logfc.threshold') across both groups. But this kind of filtering can miss weaker signals.
Usage
PoweREST_subset(Seurat_obj,cond,replicates=1,spots_num,
iteration=100,random_seed=1,pvalue=0.05,logfc.threshold = 0.1,
min.pct = 0.01,...)
Arguments
Seurat_obj |
A Seurat object. |
cond |
The name of the variable that indicates different conditions which is also stored in the meta.data of the Seurat_obj and should be in character type. |
replicates |
The number of sample replicates per group. |
spots_num |
The number of spots per replicate. |
iteration |
The number of iterations of the resampling. |
random_seed |
To set a random seed. |
pvalue |
The pvalue that will be considered significant. |
logfc.threshold |
For every resampling, limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups. Default is 0.1 Increasing logfc.threshold speeds up the function, but can miss weaker signals. |
min.pct |
For every resampling, only test genes that are detected in a minimum fraction of min.pct spots in either of the two populations. Meant to speed up the function by not testing genes that are very infrequently expressed. Default is 0.01. |
... |
DE test to use other than the default Wilcoxon test. |
Value
A list of values containing the power, average log2FC and percentage of spots detecting the gene among the resampling data, the replicate value and the spots number per slice specified by the user and the filtered.
Author(s)
Lan Shui lshui@mdanderson.org
Fit with XGBoost
Description
This function estimates the power values based on XGBoost under 3-dimensional monotone constraints upon avg_log2FC, avg_PCT and replicates. This function is recommended when there exist crossings between power surfaces fitted by 'fit_powerest' and used for estimating local power values.
Usage
fit_XGBoost(power,avg_log2FC,avg_PCT,replicates,filter_zero=TRUE,
max.depth=6,eta=0.3,nround=100)
Arguments
power |
The raw power values. |
avg_log2FC |
The corresponding log2FC values. |
avg_PCT |
The corresponding PCT values. |
replicates |
The corresponding replicates number. |
filter_zero |
Whether the user would like to filter to remove the power values being 0. Default=TRUE. |
max.depth |
Maximum depth of a tree. Default=6. |
eta |
control the learning rate: scale the contribution of each tree by a factor of 0 < eta < 1 when it is added to the current approximation. Used to prevent overfitting by making the boosting process more conservative. Default=0.3. |
nround |
Max number of boosting iterations. |
Value
A object of class 'xgb.Booster'. More information about the content of a 'xgb.Booster' object can be found at the document of R package xgboost.
Author(s)
Lan Shui lshui@mdanderson.org
Examples
data(power_example)
# Fit the local power surface of avg_log2FC_abs between 1 and 2
avg_log2FC_abs_1_2<-dplyr::filter(power_example,avg_log2FC_abs>1 & avg_log2FC_abs<2)
# Fit the model
bst<-fit_XGBoost(power_example$power,avg_log2FC=power_example$avg_log2FC_abs,
avg_PCT=power_example$mean_pct,replicates=power_example$sample_size)
Fit the power surface
Description
This function loads the power values with corresponding avg_log2FC and avg_PCT derived from bootstrap sampling and utilizes the scam package to fit two dimensional smoothing splines under monotone constraints: 1.positive relationship between power and avg_log2FC; 2.positive relationship between power and avg_PCT. The values of avg_log2FC and avg_PCT can be either from the averages of the bootstrap samples or from the original spatial transcriptomics data.
Usage
fit_powerest(power,avg_log2FC,avg_PCT,filter_zero=TRUE)
Arguments
power |
The raw power values. |
avg_log2FC |
The corresponding log2FC values. |
avg_PCT |
The corresponding PCT values. |
filter_zero |
Whether the user would like to filter to remove the power values being 0, default=TRUE. |
Value
A 'scam' object is the result of scam function. More information about the content of a 'scam' object can be found at the document of R package scam.
Author(s)
Lan Shui lshui@mdanderson.org
Examples
data(result_example)
b<-fit_powerest(result_example$power,result_example$avg_logFC,result_example$avg_PCT)
3D interactive visualization
Description
This function creates 3d interactive plot of the power against other parameters based on 'plot_ly'.
Usage
plotly_powerest(pred,opacity=0.8,colors='BrBG',fig_title=NULL)
Arguments
pred |
The result from 'pred_powerest'. |
opacity |
The opacity of the graph, default=0.8. |
colors |
The color for the graph, default='BrBG'. |
fig_title |
The title of the graph, default=NULL. |
Value
A 3d interactive plot of the power surface. Users can also plot multiple surfaces together to compare them.
Author(s)
Lan Shui lshui@mdanderson.org
Examples
data(result_example)
b<-fit_powerest(result_example$power,result_example$avg_logFC,result_example$avg_PCT)
pred <- pred_powerest(b,xlim= c(0,6),ylim=c(0,1))
plotly_powerest(pred,fig_title='Power estimation result')
An example of power results with multiple replicates number
Description
A subset of power results with multiple replicates number from PoweREST
Usage
power_example
Format
power_example
A data frame with 844 rows and 5 columns:
- avg_logFC
average log2FC
- mean_PCT
percentage of spots detecting the gene
- sample_size
number of replicates
- power
power values
- avg_log2FC_abs
the absolute value of average log2FC
Prediction results from XGBoost
Description
This function takes the result from 'fit_XGBoost' and make predictions.
Usage
pred_XGBoost(x,n.grid=30,xlim,ylim,replicates)
Arguments
x |
A object of class 'xgb.Booster'. |
n.grid |
The grid note number within 'xlim' and 'ylim', default=30. |
xlim |
The range of the absolute value of avg_log2FC used for prediction. |
ylim |
The range of the avg_pct used for prediction. |
replicates |
The replicates number. |
Value
The power estimations from XGBoost.
Author(s)
Lan Shui lshui@mdanderson.org
Examples
data(power_example)
# Fit the local power surface of avg_log2FC_abs between 1 and 2
avg_log2FC_abs_1_2<-dplyr::filter(power_example,avg_log2FC_abs>1 & avg_log2FC_abs<2)
# Fit the model
bst<-fit_XGBoost(power_example$power,avg_log2FC=power_example$avg_log2FC_abs,
avg_PCT=power_example$mean_pct,replicates=power_example$sample_size)
pred<-pred_XGBoost(bst,n.grid=30,xlim=c(0,1.5),ylim=c(0,0.1),replicates=3)
Power value prediction
Description
This function provides the prediction from the Seurat object which could be used for visualization by 'plotly_powerest' and 'vis_powerest' or the power result for your proposal or research. And it is a modified version of the scam library code predict.scam.
Usage
pred_powerest(x,n.grid=30,xlim=NULL,ylim=NULL)
Arguments
x |
A Seurat object. |
n.grid |
The grid note number within 'xlim' and 'ylim', default=30. |
xlim |
The range of the absolute value of log2FC used for prediction, default=NULL which means the original range. |
ylim |
The range of the avg_pct used for prediction, default=NULL which means the original range. |
Value
The prediction values of the power.
Author(s)
Lan Shui lshui@mdanderson.org based partly on 'scam' by Natalya Pya
Examples
data(result_example)
b<-fit_powerest(result_example$power,result_example$avg_logFC,result_example$avg_PCT)
pred <- pred_powerest(b,xlim= c(0,6),ylim=c(0,1))
An example of power results from PoweREST
Description
A subset of power results from PoweREST by running PoweREST(Peri,cond='Condition', replicates=5,spots_num=80,iteration=2)
Usage
result_example
Format
result_example
A data frame with ~20,000 rows and 3 columns:
- power
power values
- avg_logFC
average log2FC
- avg_PCT
percentage of spots detecting the gene
Visualization of the power estimations from XGBoost
Description
This function takes the result from 'pred_XGboost' and plots 2D/3D views of it,
Usage
vis_XGBoost(x,view='2D',legend_name='Power',
xlab='avg_log2FC_abs',ylab='mean_pct')
Arguments
x |
The result dataframe from 'pred_XGboost'. |
view |
determines plot 2D/3D view, default='2D'. |
legend_name |
The name of legend, default='Power'. |
xlab |
The name of xlab, default='avg_log2FC_abs'. |
ylab |
The name of ylab, default='mean_pct'. |
Value
A 2D/3D plot of the power results from XGBoost.
Author(s)
Lan Shui lshui@mdanderson.org
Examples
data(power_example)
# Fit the local power surface of avg_log2FC_abs between 1 and 2
avg_log2FC_abs_1_2<-dplyr::filter(power_example,avg_log2FC_abs>1 & avg_log2FC_abs<2)
# Fit the model
bst<-fit_XGBoost(power_example$power,avg_log2FC=power_example$avg_log2FC_abs,
avg_PCT=power_example$mean_pct,replicates=power_example$sample_size)
pred<-pred_XGBoost(bst,n.grid=30,xlim=c(0,1.5),ylim=c(0,0.1),replicates=3)
vis_XGBoost(pred,view='2D',legend_name='Power',xlab='avg_log2FC_abs',ylab='mean_pct')
Visualization of the power surface
Description
This function takes the result from 'pred_powerest' and plots 2D views of it, supply ticktype="detailed" to get proper axis annotation and is a modified version of the 'scam' library code 'vis.scam'.
Usage
vis_powerest(x,color="heat",contour.col=NULL,
se=-1,zlim=NULL,n.grid=30,col=NA,plot.type="persp",
nCol=50,...)
Arguments
x |
A scam object. |
color |
The color of the plot which can be one of the "heat", "topo", "cm", "terrain", "gray" or "bw". |
contour.col |
The color of the contour plot when using plot.type="contour". |
se |
If less than or equal to zero then only the predicted surface is plotted, but if greater than zero, then 3 surfaces are plotted, one at the predicted values minus se standard errors, one at the predicted values and one at the predicted values plus se standard errors. |
zlim |
The range of power value the user want to show. |
n.grid |
The number of grid nodes in each direction used for calculating the plotted surface. |
col |
The colors for the facets of the plot. If this is NA then if se>0 the facets are transparent, otherwise the color scheme specified in color is used. If col is not NA then it is used as the facet color. |
plot.type |
One of "contour" or "persp". |
nCol |
The number of colors to use in color schemes. |
... |
Other arguments. |
Value
A 2d plot of the power surface. More details can be seen at scam.
Author(s)
Lan Shui lshui@mdanderson.org based partly on 'scam' by Natalya Pya
Examples
data(result_example)
b<-fit_powerest(result_example$power,result_example$avg_logFC,result_example$avg_PCT)
pred <- pred_powerest(b,xlim= c(0,6),ylim=c(0,1))
vis_powerest(pred,theta=-30,phi=30,color='heat',ticktype = "detailed",xlim=c(0,6),nticks=5)