Type: | Package |
Title: | A Semi-Supervised Category Identification and Assignment Tool |
Version: | 1.2.0 |
Date: | 2019-7-16 |
Author: | Ze Zhang |
Maintainer: | Ze Zhang <Ze.Zhang@utsouthwestern.edu> |
Depends: | R (≥ 2.15.0), MASS, gplots |
Description: | An automatic cell type detection and assignment algorithm for single cell RNA-Seq and Cytof/FACS data. 'SCINA' is capable of assigning cell type identities to a pool of cells profiled by scRNA-Seq or Cytof/FACS data with prior knowledge of markers, such as genes and protein symbols that are highly or lowly expressed in each category. See Zhang Z, et al (2019) <doi:10.3390/genes10070531> for more details. |
URL: | http://lce.biohpc.swmed.edu/scina/ https://github.com/jcao89757/SCINA |
License: | GPL-2 |
NeedsCompilation: | no |
RoxygenNote: | 6.1.0 |
Packaged: | 2019-07-17 18:39:19 UTC; s421955 |
Repository: | CRAN |
Date/Publication: | 2019-07-18 06:38:25 UTC |
A semi-supervised cell type identification and assignment tool.
Description
An automatic cell type detection and assignment algorithm for single cell RNA-Seq (scRNA-seq) and Cytof/FACS data. See Zhang Z, et al (2019) <doi:10.3390/genes10070531> for more details.
Usage
SCINA(exp, signatures, max_iter = 100, convergence_n = 10, convergence_rate = 0.99,
sensitivity_cutoff = 1, rm_overlap = 1, allow_unknown = 1, log_file = "SCINA.log")
Arguments
exp |
A normalized matrix representing the gene expression levels. The log-transformation is suggested to avoid heavy-tailed datasets. Columns correpond to cells, rows correspond to genes or protein symbols. |
signatures |
A list contains multiple signature vectors. Each signature vector contains genes or protein symbols, representing the prior knowledge for one cell type. |
max_iter |
An integer > 0. Default is 100. Max iterations allowed for the EM algorithm. |
convergence_rate |
A float between 0 and 1. Default is 0.99. Percentage of cells for which the type assignment remains stable for the last n rounds. |
sensitivity_cutoff |
A float between 0 and 1. Default is 1. The cutoff to remove signatures whose cells types are deemed as non-existent at all in the data by the SCINA algorithm. |
rm_overlap |
A binary value, default 1 (TRUE), denotes that shared symbols between signature lists will be removed. If 0 (FALSE) then allows different cell types to share the same identifiers. |
allow_unknown |
A binary value, default 1 (TRUE). If 0 (FALSE) then no cell will be assigned to the 'unknown' category. |
convergence_n |
An integer > 0. Default is 10. Stop the SCINA algorithm if during the last n rounds of iterations, cell type assignment keeps steady above the convergence_rate. |
log_file |
A string names the record of the running status of the SCINA algorithem, default 'SCINA.log'. |
Details
More detailed information can be found from our web server: http://lce.biohpc.swmed.edu/scina/.
For any symbols in signature list, if the cell type is identified with symbol X's low detection level, please specify the symbol as 'low_X'. The name for the list is the cell type.
Details for 'low_X' (take scRNA-Seqs as an example):
(a) There are 4 cell types, the first one highly express one gene A, and the other three lowly express the same gene.
Then it is better to specify A as the high marker for cell type 1, but it is not a good idea to specify A as the low
expression marker for cell type 2,3,4.
(b) There are 4 cell types, the first one lowly express one gene A, and the other three highly express the same gene.
Then is it better to specify A as the low marker for cell type 1, but it is not a good idea to specify A as the
high expression marker for cell type 2,3,4.
(c) There are 4 cell types, the first one lowly express one gene A, the second and third one moderately express gene A,
and the last one highly express gene A. Then is it better to specify A as the low marker for cell type 1, and as the high
expression marker for cell type 4.
(d) The same specification can be applied to protein markers in CyTOF anlysis.
Small sensitivity_cutoff leads to more signatures to be removed, and 1 denotes that no signature is removed.
Value
cell_labels return a vector contains cell type mapping results for each cell.
probabilities return a probability matrix indicating the predicted probability for each cell belongs to each cell type respectively.
Examples
load(system.file('extdata','example_expmat.RData', package = "SCINA"))
load(system.file('extdata','example_signatures.RData', package = "SCINA"))
exp = exp_test$exp_data
results = SCINA(exp, signatures, max_iter = 120, convergence_n = 12,
convergence_rate = 0.999, sensitivity_cutoff = 0.9)
table(exp_test$true_label, results$cell_labels)
A function to plot SCINA results in a heatmap.
Description
A function to plot SCINA results in a heatmap.
Usage
plotheat.SCINA(exp, results, signatures)
Arguments
exp |
See more details in |
results |
An output object returned from SCINA. |
signatures |
See more details in |
Value
Plot a heatmap showing signature genes' expression level and SCINA predicted cell types.
A function to convert signatures uploaded via .csv files to lists used by SCINA.
Description
A function to convert signatures uploaded via .csv files to lists used by SCINA.
Usage
preprocess.signatures(file_path)
Arguments
file_path |
The path of the .csv file. The first row of the file should be cell type names. Each column is occupied by the signature genes/protein markers for the cell type in the first row. Please find more details in |
Value
A list of signature gene lists as an input for SCINA.