Title: | Bayesian Model for Genotyping using RNA-Seq |
Version: | 1.0.0 |
Description: | The method models RNA-seq reads using a mixture of 3 beta-binomial distributions to generate posterior probabilities for genotyping bi-allelic single nucleotide polymorphisms. Elena Vigorito, Anne Barton, Costantino Pitzalis, Myles J. Lewis and Chris Wallace (2023) <doi:10.1093/bioinformatics/btad393> "BBmix: a Bayesian beta-binomial mixture model for accurate genotyping from RNA-sequencing." |
License: | GPL-2 |
Encoding: | UTF-8 |
Biarch: | true |
Depends: | R (≥ 3.4.0) |
Imports: | methods, Rcpp (≥ 0.12.0), rstan (≥ 2.18.1), R.utils, data.table, rmutil |
LinkingTo: | BH (≥ 1.66.0), Rcpp (≥ 0.12.0), RcppEigen (≥ 0.3.3.3.0), rstan (≥ 2.18.1), StanHeaders (≥ 2.18.0) |
SystemRequirements: | GNU make |
RoxygenNote: | 7.1.0 |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
NeedsCompilation: | yes |
Packaged: | 2023-07-05 18:36:26 UTC; ev250 |
Author: | Elena Vigorito [aut, cre], Chris Wallace [aut] |
Maintainer: | Elena Vigorito <elena.vigorito@mrc-bsu.cam.ac.uk> |
Repository: | CRAN |
Date/Publication: | 2023-07-06 14:00:09 UTC |
The 'bbmix' package.
Description
Bayesian Beta-Binomial mixture model for RNA-seq genotyping
References
Stan Development Team (2018). RStan: the R interface to Stan. R package version 2.18.2. https://mc-stan.org
Call genotypes using beta binomial after model training
Description
Call genotypes using beta binomial after model training
Usage
call_gt(
allele_counts_f,
depth = 10,
stan_f = NULL,
legend_f,
pop = "EUR",
prob = 0.99,
fisher_f = NULL,
fisher = 30,
cluster_f = NULL,
out
)
Arguments
allele_counts_f |
vector with file names with allele counts for SNPs |
depth |
min read count to call variant |
stan_f |
full name to stan object with model fit to extract mean of parameters. Defaults to the model trained with genome in a bottle reads. Otherwise this object can be generated with fit_bb function. |
legend_f |
full name for file with SNP info to get allele frequency for prior |
pop |
population to select AF for GT prior, defaults to EUR |
prob |
cut-off for making hard calls, defaults to 0.99 |
fisher_f |
file with Fisher test to detect strand bias |
fisher |
cut_off for Fisher test to detect strand bias |
cluster_f |
file with info about SNP clusters |
out |
character with file name to save genotype output |
Value
data table with genotype probabilities
Examples
## Retrive input files for running call_gt
counts_f <- system.file("extdata/input", "NA12878.chr22.Q20.allelicCounts.txt",
package = "bbmix",
mustWork = TRUE)
legend <- system.file("extdata/input", "1000GP_Phase3_chr22.legend",
package = "bbmix", mustWork = TRUE)
fisher_f <- system.file("extdata/input", "chr22.FS.Q20.alleleCounts.txt",
package = "bbmix", mustWork = TRUE)
cluster_f <- system.file("extdata/input", "fSNPs_22_RP_maf0_01_cluster3window35.txt",
package = "bbmix", mustWork = TRUE)
out <- paste0(tempdir() , "/NA12878.chrom22.gt.txt")
## Run call_gt:
call_gt(allele_counts_f = counts_f,
legend_f = legend,
fisher_f = fisher_f,
cluster_f = cluster_f,
out = out)
unlink(out)
call gt helper, calculate mean dbetabinom from all posterior samples
Description
call gt helper, calculate mean dbetabinom from all posterior samples
Usage
call_help(n, m, mu, lambda)
Arguments
n |
counts alt allele |
m |
total counts |
mu |
vector with posterior draws for mu param |
lambda |
vector with posterior draws for lambda param |
Value
mean of dbetabinom
Exclude fSNPs with no alternative allele in any sample. Also exclude fSNPs if all samples are hom.
Description
Exclude fSNPs with no alternative allele in any sample. Also exclude fSNPs if all samples are hom.
Usage
ex_alt_hom(gt_f, out)
Arguments
gt_f |
character vector with file names with genotype calls per sample |
out |
file name to save output |
Value
save file
Examples
gt_f <- system.file("extdata/output", "gt.NA12878.chr22.txt",
package = "bbmix",
mustWork = TRUE)
out <- tempfile()
## Running function
ex_alt_hom(gt_f, out)
unlink(out)
Fit beta binomial distribution to allelic counts for homozygous reference, heterozygous, homozygous alternative
Description
Fit beta binomial distribution to allelic counts for homozygous reference, heterozygous, homozygous alternative
Usage
fit_bb(
counts_f,
depth = 10,
N = 1000,
prefix = NULL,
k = 3,
alpha_p = c(1, 10, 499),
beta_p = c(499, 10, 1),
out,
mc.cores = NULL
)
Arguments
counts_f |
file name with allele counts for SNPs |
depth |
depth cut-off to use to select SNPs to fit distributions |
N |
number of SNPs to use for fitting |
prefix |
charcter with prefix to add for saving files, defaults to NULL |
k |
number of components for mixture model, defaults to 3 |
alpha_p |
alpha parameter for the k components of alpha parameter |
beta_p |
beta paramenter for the k components of Beta parameter |
out |
character with dir name to save output |
mc.cores |
number of cores to use, defaults to parallel detected cores |
Value
saves stan object to file
Examples
## Not run:
## Retrive input files for running call_gt
counts_f <- system.file("extdata/input", "NA12878.chr22.Q20.allelicCounts.txt",
package = "bbmix",
mustWork = TRUE)
out <- tempdir()
fit_bb(counts_f = counts_f, N=10,
out = out, mc.cores=1)
unlink(out)
## End(Not run)
call gt helper, get posterior mean, expected gt and sd expected gt across all samples
Description
call gt helper, get posterior mean, expected gt and sd expected gt across all samples
Usage
gt_help(stan_samples, pop, data)
Arguments
stan_samples |
matrix with samples extracted from stan fit object, params mu and lambda |
pop |
population to select AF for GT prior, defaults to EUR |
data |
data table 1 row with counts and EAF to apply model |
Value
gt_help()
Pool randomly selected reads from different files
Description
Pool randomly selected reads from different files
Usage
poolreads(files, N = 1000, d = 10, out)
Arguments
files |
names for files to extract reads |
N |
number of reads to extract |
d |
depth for reads |
out |
file name to save reads |
Value
save files
Examples
counts_f <- system.file("extdata/input", "NA12878.chr22.Q20.allelicCounts.txt",
package = "bbmix",
mustWork = TRUE)
## In this example we only use one file and we take a pool of 10 reads
out <- tempfile()
poolreads(files=counts_f,
N=10,
d=10,
out = out)
unlink(out)