Help for package FDRsampsize

Type:

Package

Title:

Compute Sample Size that Meets Requirements for Average Power and FDR

Version:

1.0

Author:

Stan Pounds <stanley.pounds@stjude.org>

Maintainer:

Stan Pounds <stanley.pounds@stjude.org>

Depends:

R (≥ 2.15.1)

Imports:

stats

Date:

2016-01-06

Description:

Defines a collection of functions to compute average power and sample size for studies that use the false discovery rate as the final measure of statistical significance.

License:

GPL-2

NeedsCompilation:

Packaged:

2016-01-15 01:53:06 UTC; spounds

Repository:

CRAN

Date/Publication:

2016-01-15 10:28:09

An R package to Perform Power and Sample Size Calculations for Microarray Studies

Description

A general approach to performing power and sample size calculations for microarray studies has been developed in the literature. However, the software associated with those articles implements the approach only for studies that will perform the t-test or one-way ANOVA to compare gene expression across two or more groups. Here, we describe a set of R routines that implement the general method for power and sample size calculations for a wider variety of statistical tests. These routines accept the name of a function that computes the power for the statistical test of interest and thus have the flexibility to perform calculations for virtually any statistical test with a known power formula.

Details

Package:	FDRsampsize
Type:	Package
Version:	1.0
Date:	2016-01-06
License:	GPL(>=2)

Author(s)

Stan Pounds <stanley.pounds@stjude.org>

References

A Onar-Thomas, S Pounds. FDRsampsize: An R package to Perform Generalized Power and Sample Size Calculations for Planning Studies that use the False Discovery Rate to Measure Significance, Manuscript 2016.

Pounds, Stan, and Cheng Cheng. "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.

Jung, Sin-Ho. "Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.

Compute the anticipated FDR

Description

Compute the anticipated FDR for given sample size, p-value threshold, and effect size.

Usage

afdr (n, alpha, pow.func, eff.size, lam = 0.95, eps = 1e-04, 
    ...)

Arguments

n

sample size (scalar)

alpha

p-value cut-off (scalar)

pow.func

an R function that computes statistical power

eff.size

effect size vector

lam

p-value at which to evaluate ensemble PDF (for pi.star)

eps

epsilon for numerical differentiation

...

additional agruments for the functions

Details

The aFDR is defined by Pounds and Cheng (2005) as the anticipated false discovery rate incurred by performing all tests with p-value threshold alpha given the same size effect size and power function.

Value

the aFDR

References

Pounds, Stan, and Cheng Cheng. "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.

Jung, Sin-Ho. "Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.

Examples

 afdr(n=50,alpha=0.01,pow.func=power.twosampt,eff.size=rep(c(1,0),c(100,900)))

Find the fixed p-value threshold that controls the FDR at a specified level

Description

Find the p-value threshold that satisfies an FDR requirement (if such a threshold exists)

Usage

alpha.fdr (fdr, n, pow.func, eff.size, null.effect, lam = 0.95, 
    eps = 1e-04, tol = 1e-04, ...)

Arguments

fdr

Desired FDR, scalar

n

sample size

pow.func

an R function to compute statistical power

eff.size

effect size vector

null.effect

value of effect size that corresponds to the null hypothesis

lam

the lambda parameter in computing the pi0 (proportion of tests with a true null) estimate of Storey (2002)

eps

epsilon for numerical differentiation

tol

tolerance for bisection solution to alpha

...

additional agruments for the functions

Value

a list with the following components:

fdr

the FDR at that alpha

alpha

the determined alpha

OK

indicates if the requirement is met

References

A Onar-Thomas, S Pounds "FDRsampsize: An R package to Perform Generalized Power and Sample Size Calculations for Planning Studies that use the False Discovery Rate to Measure Significance", Manuscript 2015.

Pounds, Stan, and Cheng Cheng. "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.

Jung, Sin-Ho. "Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.

Examples

 alpha.fdr(fdr=0.1,n=50,pow.func=power.twosampt,
           eff.size=rep(0:1,c(900,100)),null.effect=0)

Find the p-value threshold that gives a specified average power

Description

Find p-value cut-off that yields desired average power given n and effect size

Usage

alpha.power (ave.pow, n, pow.func, eff.size, null.effect, tol = 1e-06, 
    ...)

Arguments

ave.pow

desired average power (scalar)

n

sample size

pow.func

name of R function to compute statistical power

eff.size

effect size vector

null.effect

value of effect size that corresponds to null hypothesis

tol

tolerance for bisection solution to alpha

...

additional agruments for the functions

Value

a list with the following components:

alpha

desired value of alpha

ave.pow

average power at that alpha

References

A Onar-Thomas, S Pounds. "FDRsampsize: An R package to Perform Generalized Power and Sample Size Calculations for Planning Studies that use the False Discovery Rate to Measure Significance", Manuscript 2015. Pounds, Stan, and Cheng Cheng. "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.

Jung, Sin-Ho. "Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.

Examples

 alpha.power(ave.pow=0.8,n=50,pow.func=power.twosampt,
             eff.size=rep(0:1,c(900,100)),null.effect=0)

Compute average power for a given sample size

Description

Compute average power for given sample size, effect size, and p-value threshold

Usage

average.power (n, alpha, pow.func, eff.size, null.effect, ...)

Arguments

n

sample size

alpha

p-value cut off (scalar)

pow.func

an R function to compute statistical power

eff.size

effect size vector

null.effect

value of effect size that corresponds to null hypothesis

...

additional agruments for the functions

Value

average power (scalar)

References

Pounds, Stan, and Cheng Cheng. "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271. Gadbury GL, et al. (2004) Power and sample size estimation in high dimensional biology. Statistical Methods in Medical Research 13(4):325-38. Jung, Sin-Ho. "Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.

Examples

 average.power(n=50,alpha=0.01,pow.func=power.twosampt,
               eff.size=rep(0:1,c(900,100)),null.effect=0)

Compute the average power at a specific FDR control level

Description

Compute the average power at a specific level of FDR control for a given effect size and sample size

Usage

fdr.power (fdr, n, pow.func, eff.size, null.effect, lam = 0.95, 
    eps = 1e-04, tol = 1e-04, ...)

Arguments

fdr

Desired FDR, scalar

n

sample size

pow.func

name of R function to compute statistical power

eff.size

effect size vector; will be provided as the third argument of pow.func

null.effect

value of effect size that corresponds to null hypothesis

lam

name of R function to compute statistical power

eps

epsilon for numerical differentiation

tol

tolerance for bisection solution to alpha

...

additional agruments for the functions

Value

average power (scalar) of the tests with a false null hypothesis

References

Gadbury GL, et al. (2004) Power and sample size estimation in high dimensional biology. Statistical Methods in Medical Research 13(4):325-38.

Pounds S and Cheng C (2005) Sample size determination for the false discovery rate. Bioinformatics 21(23): 4263-71.

Examples

 fdr.power(fdr=0.10,n=50,pow.func=power.twosampt,
           eff.size=rep(0:1,c(900,100)),null.effect=0)

Determine sample size required to achieve a desired average power while controlling the FDR at a specified level.

Description

Determines the sample size needed to achieve a desired average power while controlling the FDR at a specified level.

Usage

fdr.sampsize (fdr, ave.pow, pow.func, eff.size, null.effect, max.n = 500, 
    min.n = 5, tol = 1e-05, eps = 1e-05, lam = 0.95, ...) 
## S3 method for class 'fdr.sampsize'
print(x,...)

Arguments

fdr

Desired FDR (scalar numeric)

ave.pow

Desired average power (scalar numeric)

pow.func

Character string name of function to compute power; must accept n, alpha, and eff.size as its first three arguments. Other optional arguments may also be provided.

eff.size

Numeric vector of effect sizes; interally, this will be provided as the third argument of pow.func

null.effect

Scalar value of the effect size under the null hypothesis. This may be 0 or 1 for tests that respectively use differences or ratios for comparisons.

max.n

Maximum n to consider

min.n

Minimum n to consider

tol

Tolerance for bisection calculations

eps

Epsilon for numerical differentiation

lam

Lambda for computing anticipated pi0 estimate, see Storey 2002.

x

result of the fdr.sampsize function

...

additional arguments for pow.func

Details

This function checks the technical conditions regarding whether the desired FDR can be achieved by min.n or max.n before calling n.fdr. Thus, for most applications, fdr.sampsize should be used instead of n.fdr.

Value

fdr.sampsize returns an object of class 'FDRsampsize', which is a list with the following components:

OK

indicates if the requirement is met

n

the computed sample size

alpha

the p-value threshold that gives the desired FDR

fdr.hat

anticipated value of the FDR estimate given n and effect size

act.fdr

actual expected FDR given n and effect size

ave.pow

average power

act.pi

actual value of pi0, the proportion of tests with a true null hypothesis.

pi.hat

expected value of the pi0 estimate

eff.size

input effect size vector

References

Pounds, Stan, and Cheng Cheng. "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.

Jung, Sin-Ho. "Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.

Examples

 power.twosampt             # show the power.cox function
 res=fdr.sampsize(fdr=0.1,
                  ave.pow=0.8,
                  pow.func=power.twosampt,
                  eff.size=rep(c(1,0),c(10,990)),
                  null.effect=0)
 res

Find the sample size that meets desired FDR and power criteria

Description

Find smallest sample size that meets requirements for average power and FDR

Usage

n.fdr (ave.pow, fdr, pow.func, eff.size, null.effect, lam = 0.95, 
    eps = 1e-04, n0 = 5, n1 = 500, tol = 1e-06, ...)

Arguments

ave.pow

required average power (scalar)

fdr

required FDR (scalar)

pow.func

name of R function that computes statistical power

eff.size

effect size vector

null.effect

Value of effect size that indicates null

lam

p-value at which to evaluate ensemble PDF

eps

epsilon for numerical differentiation

n0

smallest sample size to be considered for bisection

n1

maximum sample size to be considered for bisection

tol

tolerance for solving for alpha in iterations

...

additional agruments for the functions

Details

This performs the sample size calculation without checking whether the minimum or maximum sample size satisfy the desired requirements. The fdr.sampsize function checks these conditions and then calls n.fdr. Thus, many users will may prefer to use the sampsize.fdr function instead of n.fdr.

Value

a list with the following components:

n

a sample size estimate

alpha

the p-value cut-off

fdr.hat

an approximation to the expected value of the FDR estimate given n

ave.pow

the average power

fdr.act

the actual FDR given n

pi.hat

expected value of the pi.hat estimator given n

act.pi

actual pi0

References

Pounds, Stan, and Cheng Cheng. "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.

Jung, Sin-Ho. "Sample size for FDR-control in microarray data analysis." Bioinformatics 21.14 (2005): 3097-3104.

Compute the anticipated null proportion estimate

Description

Compute an approximation of the expected value of the null proportion estimate given the sample size and effect size.

Usage

pi.star (n, pow.func, eff.size, lam = 0.95, eps = 1e-04, ...)

Arguments

n

sample size

pow.func

an R function to compute statistical power

eff.size

effect size vector

lam

p-value at which to numerically evaluate p-value pdf (scalar)

eps

epsilon for numerical differentiation

...

additional agruments for the functions

Value

scalar value for approximated E(pi.hat)

References

#> Pounds, Stan, and Cheng Cheng. "Sample size determination for the false discovery rate." Bioinformatics 21.23 (2005): 4263-4271.

Compute the power of a single-predictor Cox regression model

Description

Use the formula of Hseih and Lavori (2000) to compute the power of a single-predictor Cox model.

Usage

power.cox (n, alpha, logHR, v)

Arguments

n

number of events (scalar)

alpha

p-value threshold (scalar)

logHR

log hazard ratio (vector)

v

variance of predictor variable (vector)

Value

vector of power estimates for two-sided test

References

Hsieh, FY and Lavori, Philip W (2000) Sample-size calculations for the Cox proportional hazards regression model with nonbinary covariates. Controlled Clinical Trials 21(6):552-560.

Examples

 power.cox             # show the power.cox function
 res=fdr.sampsize(fdr=0.1,
                  ave.pow=0.8,
                  pow.func=power.cox,
                  eff.size=rep(c(log(2),0),c(100,900)),
                  null.effect=0,
                  v=1)
 res

Compute Power for RNA-seq Experiments Assuming Negative Binomial Distribution.

Description

Use the formula of Hart et al (2013) to compute power for comparing RNA-seq expression across two groups assuming a negative binomial distribution.

Usage

power.hart (n, alpha, log.fc, mu, sig)

Arguments

n

per-group sample size (scalar)

alpha

p-value threshold (scalar)

log.fc

log fold-change (vector), usual null hypothesis is log.fc=0

mu

read depth per gene (vector, same length as log.fc)

sig

coefficient of variation (CV) per gene (vector, same length as log.fc)

Details

This function is based on equation (1) of Hart et al (2013). It assumes a negative binomial model for RNA-seq read counts and equal sample size per group.

Value

vector of power estimates for the set of two-sided tests

References

SN Hart, TM Therneau, Y Zhang, GA Poland, and J-P Kocher (2013). Calculating Sample Size Estimates for RNA Sequencing Data. Journal of Computational Biology 20: 970-978.

Examples

 power.hart       # show the power function
 n.hart=2*(qnorm(0.975)+qnorm(0.9))^2*(1/20+0.6^2)/(log(2)^2) # Equation 6 of Hart et al
 power.hart(n.hart,0.05,log(2),20,0.6)                        # Recapitulate 90% power  
 res=fdr.sampsize(fdr=0.1,
                  ave.pow=0.8,
                  pow.func=power.hart,
                  eff.size=rep(c(log(2),0),c(100,900)),
                  null.effect=0,mu=5,sig=1)
 res

Compute Power for RNA-Seq Experiments Assuming Poisson Distribution

Description

Use the formula of Li et al (2013) to compute power for comparing RNA-seq expression across two groups assuming the Poisson distribution.

Usage

power.li (n, alpha, rho, mu0, w = 1, type = "w")

Arguments

n

per-group sample size

alpha

p-value threshold

rho

fold-change, usual null hypothesis is that rho=1

mu0

average count in control group

w

ratio of total number of

type

type of test: "w" for Wald, "s" for score, "lw" for log-transformed Wald, "ls" for log-transformed score.

Details

This function computes the power for each of a series of two-sided tests defined by the input parameters. The power is based on the sample size formulas in equations 10-13 of Li et al (2013). Also, note that the null.effect is set to 1 in the examples because the usual null hypothesis is that the fold-change = 1.

Value

vector of power estimates for two-sided tests

References

C-I Li, P-F Su, Y Guo, and Y Shyr (2013). Sample size calculation for differential expression analysis of RNA-seq data under Poisson distribution. Int J Comput Biol Drug Des 6(4). doi:10.1504/IJCBDD.2013.056830

Examples

 power.li      # show the power function
 power.li(88,0.05,1.25,5,0.5,"w")  # recapitulate 80% power in Table 1 of Li et al (2013)
 res=fdr.sampsize(fdr=0.1,
                  ave.pow=0.8,
                  pow.func=power.li,
                  eff.size=rep(c(1.5,1),c(100,900)),
                  null.effect=1,
                  mu0=5,w=1,type="w")
 res

Compute power of the one-sample t-test

Description

Estimate power of the one-sample t-test;Uses classical power formula for one-sample t-test

Usage

power.onesampt (n, alpha, delta, sigma = 1)

Arguments

n

number of events (scalar)

alpha

p-value threshold (scalar)

delta

difference of actual mean from null mean (vector)

sigma

standard deviation (vector or scalar, default=1)

Value

vector of power estimates for two-sided test

Examples

 power.onesampt        # show the power function
 res=fdr.sampsize(fdr=0.1,
                  ave.pow=0.8,
                  pow.func=power.onesampt,
                  eff.size=rep(c(2,0),c(100,900)),
                  null.effect=0,
                  sigma=1)
 res

Compute power of one-way ANOVA

Description

Compute power of one-way ANOVA;Uses classical power formula for ANOVA;Assumes equal variance and sample size

Usage

power.oneway (n, alpha, theta, k = 2)

Arguments

n

per-group sample size (scalar)

alpha

p-value threshold (scalar)

theta

sum of ((group mean - overall mean)/stdev)^2 across all groups for each hypothesis test (vector)

k

the number of groups to be compared, default k=2

Details

For many applications, the null effect is zero for the parameter theta described above.

Value

vector of power estimates for test of equal means

Examples

 power.oneway        # show the power function
 res=fdr.sampsize(fdr=0.1,
                  ave.pow=0.8,
                  pow.func=power.oneway,
                  eff.size=rep(c(2,0),c(100,900)),
                  null.effect=0,
                  k=3)
 res

Compute power of the rank-sum test

Description

Compute power of rank-sum test;Uses formula of Noether (JASA 1987)

Usage

power.ranksum (n, alpha, p)

Arguments

n

sample size (scalar)

alpha

p-value threshold (scalar)

p

Pr (Y>X), as in Noether (JASA 1987)

Details

In most applications, the null effect size will be designated by p = 0.5, which indicates that Thus, in the example below, the argument null.effect=0.5 is specified in the call to fdr.sampsize.

Value

vector of power estimates for two-sided tests

References

Noether, Gottfried E (1987) Sample size determination for some common nonparametric tests. Journal of the American Statistical Association, 82:645-647.

Examples

 power.ranksum        # show the power function
 res=fdr.sampsize(fdr=0.1,
                  ave.pow=0.8,
                  pow.func=power.ranksum,
                  eff.size=rep(c(0.8,0.5),c(100,900)),
                  null.effect=0.5)
 res

Compute power of the sign test

Description

Use the Noether (1987) formula to compute the power of the sign test

Usage

power.signtest (n, alpha, p)

Arguments

n

sample size (scalar)

alpha

p-value threshold (scalar)

p

Pr (Y>X), as in Noether (JASA 1987)

Details

In most applications, the null effect size will be designated by p = 0.5 instead of p = 0. Thus, in the call to fdr.sampsize, we specify null.effect=0.5 in the example below.

Value

vector of power estimates for two-sided tests

References

Noether, Gottfried E (1987) Sample size determination for some common nonparametric tests. Journal of the American Statistical Association, 82:645-647.

Examples

 power.signtest        # show the power function
 res=fdr.sampsize(fdr=0.1,
                  ave.pow=0.8,
                  pow.func=power.signtest,
                  eff.size=rep(c(0.8,0.5),c(100,900)),
                  null.effect=0.5)
 res

Compute Power of the t-test for non-zero correlation

Description

Estimate power of t-test for non-zero correlation;Uses classical power formula for t-test

Usage

power.tcorr (n, alpha, rho)

Arguments

n

sample size (scalar)

alpha

p-value threshold (scalar)

rho

population correlation coefficient (vector)

Details

For many applications, the null.effect is rho=0.

Value

vector of power estimates for two-sided tests

Examples

 power.tcorr        # show the power function
 res=fdr.sampsize(fdr=0.1,
                  ave.pow=0.8,
                  pow.func=power.tcorr,
                  eff.size=rep(c(0.3,0),c(100,900)),
                  null.effect=0)
 res

Compute power of the two-samples t-test

Description

Estimate power of the two-samples t-test;Uses classical power formula for two-sample t-test;Assumes equal variance and sample size

Usage

power.twosampt (n, alpha, delta, sigma = 1)

Arguments

n

per-group sample size (scalar)

alpha

p-value threshold (scalar)

delta

difference between population means (vector)

sigma

standard deviation (vector or scalar)

Details

For many applications, the null.effect is zero difference of means.

Value

vector of power estimates for two-sided test

Examples

 power.twosampt        # show the power function
 res=fdr.sampsize(fdr=0.1,
                  ave.pow=0.8,
                  pow.func=power.twosampt,
                  eff.size=rep(c(2,0),c(100,900)),
                  null.effect=0,
                  sigma=1)
 res