% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dsb.r
\name{DSBNormalizeProtein}
\alias{DSBNormalizeProtein}
\title{Normalize single cell antibody derived tag (ADT) protein data with the DSBNormalizeProtein function. This single function runs step I (ambient protein background correction) and step II (defining and removing cell to cell technical variation) of the dsb normalization method. See <https://www.biorxiv.org/content/10.1101/2020.02.24.963603v3> for details of the algorithm.}
\usage{
DSBNormalizeProtein(
  cell_protein_matrix,
  empty_drop_matrix,
  denoise.counts = TRUE,
  use.isotype.control = TRUE,
  isotype.control.name.vec = NULL,
  define.pseudocount = FALSE,
  pseudocount.use,
  quantile.clipping = FALSE,
  quantile.clip = c(0.001, 0.9995),
  return.stats = FALSE
)
}
\arguments{
\item{cell_protein_matrix}{Raw protein ADT count data to be normalized with cells as columns and proteins as rows. See vignette, this is defined after quality control outlier cell removal based on the filtered output from Cell Ranger. Any CITE-seq count alignment tool can be used to define this as well.}

\item{empty_drop_matrix}{Raw empty droplet protein count data used for background correction with cells as columns and proteins as rows. This can easily be defined from the raw output from Cell Ranger (see vignette). Any count alignment tool for CITE-seq can be used to align and define these background drops.}

\item{denoise.counts}{TRUE (default) recommended to keep this TRUE and use with use.isotype.control = TRUE. This runs step II of the dsb algorithm to define and remove cell to cell technical noise.}

\item{use.isotype.control}{TRUE (default) recommended to use this with denoise.counts = TRUE. This includes isotype controls in defining the dsb technical component.}

\item{isotype.control.name.vec}{A vector of the names of the isotype control proteins in the rows of the cells and background matrix e.g. isotype.control.name.vec = c('isotype1', 'isotype2') or rownames(cells_citeseq_mtx)[grepl('sotype', rownames(cells_citeseq_mtx))]}

\item{define.pseudocount}{FALSE (default) uses the value 10 optimized for protein ADT data.}

\item{pseudocount.use}{the pseudocount to use if overriding the default pseudocount by setting define.pseudocount = TRUE}

\item{quantile.clipping}{FALSE (default), if outliers or a large range of values for some proteins is seen (e.g. -50 to 50) re-run with quantile.clipping = TRUE. This applies 0.001 and 0.998th quantile value clipping to handle low and high magnitude outliers.}

\item{quantile.clip}{if quantile.clipping = TRUE, one can provide a vector of the lowest and highest quantile to clip, these can be tuned to the dataset size. The default c(0.001, 0.9995) optimized to clip only a few of the most extreme outliers.}

\item{return.stats}{if TRUE, returns a list, element 1 $dsb_normalized_matrix is the normalized adt matrix element 2 $dsb_stats is the internal stats used by dsb during denoising (the background mean, isotype control values, and the final dsb technical component that is regressed out of the counts)}
}
\value{
Normalized ADT data are returned as a standard R "matrix" of cells (columns), proteins (rows) that can be added to Seurat, SingleCellExperiment or python anndata object - see vignette. If return.stats = TRUE, function returns a list: x$dsb_normalized_matrix normalized matrix, x$protein_stats are mean and sd of log transformed cell, background and the dsb normalized values (as list). x$technical_stats includes the dsb technical component value for each cell and each variable used to calculate the technical component.
}
\description{
Normalize single cell antibody derived tag (ADT) protein data with the DSBNormalizeProtein function. This single function runs step I (ambient protein background correction) and step II (defining and removing cell to cell technical variation) of the dsb normalization method. See <https://www.biorxiv.org/content/10.1101/2020.02.24.963603v3> for details of the algorithm.
}
\examples{
library(dsb) # lazy load example data cells_citeseq_mtx and empty_drop_matrix included in package

# use a subset of cells and background droplets from example data
cells_citeseq_mtx = cells_citeseq_mtx[ ,1:400]
empty_drop_matrix = empty_drop_citeseq_mtx[ ,1:400]

# example I
adt_norm = dsb::DSBNormalizeProtein(
  # step I: remove ambient protein noise reflected in counts from empty droplets
  cell_protein_matrix = cells_citeseq_mtx,
  empty_drop_matrix = empty_drop_matrix,

  # recommended step II: model and remove the technical component of each cell's protein data
  denoise.counts = TRUE,
  use.isotype.control = TRUE,
  isotype.control.name.vec = rownames(cells_citeseq_mtx)[67:70]
)

# example II - experiments without isotype controls
adt_norm = dsb::DSBNormalizeProtein(
  cell_protein_matrix = cells_citeseq_mtx,
  empty_drop_matrix = empty_drop_matrix,
  denoise.counts = FALSE
)

# example III - return dsb internal stats used during denoising for each cell
# returns a 2 element list - the normalized matrix and the internal stats
dsb_object = dsb::DSBNormalizeProtein(
   cell_protein_matrix = cells_citeseq_mtx,
   empty_drop_matrix = empty_drop_matrix,
   isotype.control.name.vec = rownames(cells_citeseq_mtx)[67:70],
   return.stats = TRUE
)

# the dsb normalized matrix to be used in downstream analysis
dsb_object$dsb_normalized_matrix

# the internal dsb stats; can be examined for outliers see vignette FAQ
dsb_object$dsb_stats

}
\author{
Matthew P. Mulè, \email{mattmule@gmail.com}
}
