Type: | Package |
Title: | Receptor Abundance Estimation using Feature Selection and Gene Set Scoring |
Version: | 1.0.0 |
Maintainer: | Azka Javaid <azka.javaid.gr@dartmouth.edu> |
Description: | Performs receptor abundance estimation for single cell RNA-sequencing data using a supervised feature selection mechanism and a thresholded gene set scoring procedure. Seurat's normalization method is described in: Hao et al., (2021) <doi:10.1016/j.cell.2021.04.048>, Stuart et al., (2019) <doi:10.1016/j.cell.2019.05.031>, Butler et al., (2018) <doi:10.1038/nbt.4096> and Satija et al., (2015) <doi:10.1038/nbt.3192>. Method for reduced rank reconstruction and rank-k selection is detailed in: Javaid et al., (2022) <doi:10.1101/2022.10.08.511197>. Gene set scoring procedure is described in: Frost et al., (2020) <doi:10.1093/nar/gkaa582>. Clustering method is outlined in: Song et al., (2020) <doi:10.1093/bioinformatics/btaa613> and Wang et al., (2011) <doi:10.32614/RJ-2011-015>. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
LazyData: | true |
LazyDataCompression: | xz |
Imports: | Ckmeans.1d.dp, Matrix, Seurat, SPECK, stats, VAM |
RoxygenNote: | 7.2.3 |
Depends: | R (≥ 2.10) |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2023-11-17 20:31:39 UTC; azkajavaid |
Author: | H. Robert Frost [aut], Azka Javaid [aut, cre] |
Repository: | CRAN |
Date/Publication: | 2023-11-17 21:10:02 UTC |
STREAK: Receptor Abundance Estimation using Feature Selection and Gene Set Scoring
Description
Performs receptor abundance estimation for single cell RNA-sequencing data using a supervised feature selection mechanism and a thresholded gene set scoring procedure. Seurat's normalization method is described in: Hao et al., (2021) doi:10.1016/j.cell.2021.04.048, Stuart et al., (2019) doi:10.1016/j.cell.2019.05.031, Butler et al., (2018) doi:10.1038/nbt.4096 and Satija et al., (2015) doi:10.1038/nbt.3192. Method for reduced rank reconstruction and rank-k selection is detailed in: Javaid et al., (2022) doi:10.1101/2022.10.08.511197. Gene set scoring procedure is described in: Frost et al., (2020) doi:10.1093/nar/gkaa582. Clustering method is outlined in: Song et al., (2020) doi:10.1093/bioinformatics/btaa613 and Wang et al., (2011) doi:10.32614/RJ-2011-015.
Author(s)
Maintainer: Azka Javaid azka.javaid.gr@dartmouth.edu
Authors:
H. Robert Frost hildreth.r.frost@dartmouth.edu
Receptor abundance estimation for single cell RNA-sequencing (scRNA-seq) data using gene set scoring and thresholding.
Description
Performs receptor abundance estimation for m x n
scRNA-seq target data using gene set scoring and thresholding. scRNA-seq target counts are normalized and
reduced rank reconstructed (RRR) using the SPECK::randomizedRRR()
function. Gene set scoring is next performed leveraging expression from
the top most weighted genes based on the gene sets weights membership matrix with the VAM::vam()
function. The resulting cell-specific gene set scores
are then thresholded utilizing the Ckmeans.1d.dp::Ckmeans.1d.dp()
function. Note that this function only performs normalization
and does not perform any quality control (QC) checks on the inputted target scRNA-seq counts matrix. Any QC needed can be performed on the target matrix before passing it
as an input to the function.
Usage
receptorAbundanceEstimation(
target.rnaseq,
receptor.geneset.matrix,
num.genes = 10,
rank.range.end = 100,
min.consec.diff = 0.01,
rep.consec.diff = 2,
manual.rank = NULL,
seed.rsvd = 1,
max.num.clusters = 4,
seed.ckmeans = 2
)
Arguments
target.rnaseq |
|
receptor.geneset.matrix |
|
num.genes |
Number of top most weighted genes for subsequent gene set scoring and thresholding. |
rank.range.end |
See documentation for the |
min.consec.diff |
See documentation for the |
rep.consec.diff |
See documentation for the |
manual.rank |
See documentation for the |
seed.rsvd |
See documentation for the |
max.num.clusters |
See documentation for the |
seed.ckmeans |
See documentation for the |
Value
-
receptor.abundance.estimates
- Am x h
matrix consisting of abundance estimates form
cells inh
receptors.
Examples
data("train.malt.rna.mat")
data("train.malt.adt.mat")
receptor.geneset.matrix.out <- receptorGeneSetConstruction(train.rnaseq =
train.malt.rna.mat[1:100,1:80],
train.citeseq =
train.malt.adt.mat[1:100,1:2],
rank.range.end = 70,
min.consec.diff = 0.01,
rep.consec.diff = 2,
manual.rank = NULL,
seed.rsvd = 1)
dim(receptor.geneset.matrix.out)
head(receptor.geneset.matrix.out)
data("target.malt.rna.mat")
receptor.abundance.estimates.out <- receptorAbundanceEstimation(target.rnaseq =
target.malt.rna.mat[1:200,1:80],
receptor.geneset.matrix =
receptor.geneset.matrix.out,
num.genes = 10, rank.range.end = 70,
min.consec.diff = 0.01,
rep.consec.diff = 2,
manual.rank = NULL, seed.rsvd = 1,
max.num.clusters = 4, seed.ckmeans = 2)
dim(receptor.abundance.estimates.out)
head(receptor.abundance.estimates.out)
Gene sets weights membership matrix construction for receptor abundance estimation.
Description
Computes n x h
gene sets weights membership matrix using associations learned between log-normalized and reduced rank reconstructed (RRR)
m x n
scRNA-seq training data and m x h
CITE-seq ADT training counts normalized using the centered log ratio (CLR) transformation. scRNA-seq counts are normalized and
RRR using the SPECK::randomizedRRR()
function while CITE-seq counts are normalized using the
Seurat::NormalizeData()
function with the normalization.method
parameter set to CLR
. Spearman rank correlations are computed between the
normalized CITE-seq data and the normalized and RRR scRNA-seq data.
Usage
receptorGeneSetConstruction(
train.rnaseq,
train.citeseq,
rank.range.end = 100,
min.consec.diff = 0.01,
rep.consec.diff = 2,
manual.rank = NULL,
seed.rsvd = 1
)
Arguments
train.rnaseq |
|
train.citeseq |
|
rank.range.end |
See documentation for the |
min.consec.diff |
See documentation for the |
rep.consec.diff |
See documentation for the |
manual.rank |
See documentation for the |
seed.rsvd |
See documentation for the |
Value
-
receptor.geneset.matrix
- An x h
gene sets weights membership matrix where a columni
fromh
corresponds to the weights forn
genes from the scRNA-seq matrix trained against the corresponding CITE-seq ADT transcripth
.
Examples
data("train.malt.rna.mat")
data("train.malt.adt.mat")
receptor.geneset.matrix.out <- receptorGeneSetConstruction(train.rnaseq =
train.malt.rna.mat[1:100,1:80],
train.citeseq =
train.malt.adt.mat[1:100,1:2],
rank.range.end = 70,
min.consec.diff = 0.01,
rep.consec.diff = 2,
manual.rank = NULL, seed.rsvd = 1)
dim(receptor.geneset.matrix.out)
head(receptor.geneset.matrix.out)
Single cell RNA-sequencing (scRNA-seq) target subset of the 10X Genomics MALT counts.
Description
A random subset of joint scRNA-seq/CITE-seq 10X Genomics human extranodal marginal zone B-cell tumor/mucosa-associated lymphoid tissue (MALT) target data. See the dataProcessing.R
file from data-raw
folder for code to recreate data subset.
Usage
target.malt.rna.mat
Format
A scRNA-seq counts matrix of dgCMatrix-class
from the Matrix
package with 4000 cells and 33538 genes.
Source
<https://www.10xgenomics.com/resources/datasets/10-k-cells-from-a-malt-tumor-gene-expression-and-cell-surface-protein-3-standard-3-0-0>
Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) training subset of the 10X Genomics MALT counts.
Description
A random subset of joint scRNA-seq/CITE-seq 10X Genomics human extranodal marginal zone B-cell tumor/mucosa-associated lymphoid tissue (MALT) training data. See the dataProcessing.R
file from data-raw
folder for code to recreate data subset.
Usage
train.malt.adt.mat
Format
A CITE-seq counts matrix of dgeMatrix-class
from the Matrix
package with 1000 cells and 17 genes.
Source
<https://www.10xgenomics.com/resources/datasets/10-k-cells-from-a-malt-tumor-gene-expression-and-cell-surface-protein-3-standard-3-0-0>
Single cell RNA-sequencing (scRNA-seq) training subset of the 10X Genomics MALT counts.
Description
A random subset of joint scRNA-seq/CITE-seq 10X Genomics human extranodal marginal zone B-cell tumor/mucosa-associated lymphoid tissue (MALT) training data. See the dataProcessing.R
file from data-raw
folder for code to recreate data subset.
Usage
train.malt.rna.mat
Format
A scRNA-seq counts matrix of dgCMatrix-class
from the Matrix
package with 1000 cells and 33538 genes.
Source
<https://www.10xgenomics.com/resources/datasets/10-k-cells-from-a-malt-tumor-gene-expression-and-cell-surface-protein-3-standard-3-0-0>