Help for package MiSPU

Type:

Package

Title:

Microbiome Based Sum of Powered Score (MiSPU) Tests

Version:

1.0

Date:

2016-02-29

Author:

Chong Wu, Wei Pan

Maintainer:

Chong Wu <wuxx0845@umn.edu>

Description:

There is an increasing interest in investigating how the compositions of microbial communities are associated with human health and disease. In this package, we present a novel global testing method called aMiSPU, that is highly adaptive and thus high powered across various scenarios, alleviating the issue with the choice of a phylogenetic distance. Our simulations and real data analysis demonstrated that aMiSPU test was often more powerful than several competing methods while correctly controlling type I error rates.

License:

GPL-2

Imports:

Rcpp (≥ 0.12.1)

Depends:

R (≥ 3.2.3), vegan, ape, aSPU,cluster

Suggests:

ade4

LinkingTo:

Rcpp, RcppArmadillo

NeedsCompilation:

yes

Packaged:

2016-03-17 18:15:15 UTC; chong

Repository:

CRAN

Date/Publication:

2016-03-18 00:08:39

Microbiome Based Sum of Powered Score (MiSPU) Tests

Description

Details

Package:	MiSPU
Type:	Package
Version:	1.0
Date:	2016-02-29
License:	GPL-2

Author(s)

Chong Wu, Wei Pan Maintainer: Chong Wu <wuxx0845@umn.edu>

References

Pan, W., et al.(2014) A powerful and adaptive association test for rare variants, Genetics, 197(4), 1081-95

Chong, W., Pan, W. (2015) An Adaptive Association Test for Microbiome Data, submitted.

Examples

data(throat.otu.tab)
data(throat.tree)
data(throat.meta)

Y.tmp =throat.meta[,3]
Y = rep(0,dim(throat.meta)[1])
Y[Y.tmp=="Smoker"] = 1

cov.tmp = throat.meta[,c(10,12)]
cov = matrix(1,dim(throat.meta)[1],2)
cov[cov.tmp[,1]== "None",1] = 0
cov[cov.tmp[,2]== "Male",2] = 0

start.time = proc.time()
X = as.matrix(throat.otu.tab)

out = MiSPU(Y,X, throat.tree,cov,model =  "binomial", pow = c(2:8, Inf), n.perm = 1000)
out

The estimate of Dirichlet-multinomial distribution

Description

The estimate of Dirichlet-multinomial distribution. It just intermidates estimate.

Usage

data(dd)

Format

The format is: matrix

Details

It just intermidates estimate. Not very useful.

Examples

data(dd)

The Dirichlet Distribution

Description

Density function and random number generation for the Dirichlet distribution

Usage

rdirichlet(n, alpha)

Arguments

n

number of random observations to draw

alpha

the Dirichlet distribution's parameters. Can be a vector (one set of parameters for all observations) or a matrix (a different set of parameters for each observation), see “Details”

Details

The Dirichlet distribution is a multidimensional generalization of the Beta distribution where each dimension is governed by an \alpha-parameter. Formally this is

% \mathcal{D}(\alpha_i)=\left[\left.\Gamma(\sum_{i}\alpha_i)\right/\prod_i\Gamma(\alpha_i)\right]\prod_{i}y_i^{\alpha_i-1}%

Usually, alpha is a vector thus the same parameters will be used for all observations. If alpha is a matrix, a complete set of \alpha-parameters must be supplied for each observation.

Value

returns a matrix with random numbers according to the supplied alpha vector or matrix.

Author(s)

Chong Wu

Examples

X1 <- rdirichlet(100, c(5, 5, 10))
X1

Generalized UniFrac distances for comparing microbial communities.

Description

A generalized version of commonly used UniFrac distances. It is defined as:

d^{(\alpha)} = \frac{\sum_{i=1}^m b_i (p^A_{i} + p^B_{i})^\alpha \left\vert \frac{ p^A_{i} - p^B_{i} }{p^A_{i} + p^B_{i}} \right\vert } { \sum_{i=1}^m b_i (p^A_{i} + p^B_{i})^\alpha},

where m is the number of branches, b_i is the length of ith branch, p^A_{i}, p^B_{i} are the branch proportion for community A and B.

Generalized UniFrac distance contains an extra parameter \alpha controlling the weight on abundant lineages so the distance is not dominated by highly abundant lineages. \alpha=0.5 has overall the best power.

Usage

GUniFrac(otu.tab, tree,alpha = c(0,0.5,1))

Arguments

otu.tab

OTU count table, row - n sample, column - q OTU

tree

Rooted phylogenetic tree of R class “phylo”

alpha

Parameter controlling weight on abundant lineages

Value

Return a list containing

d0

UniFrac(0)

d5

UniFrac(0.5)

d1

UniFrac(1), weighted UniFrac

or a list containing

GUniFrac

The distance matrix for different alpha

alpha

The weight

Note

The time consuming part is written in C and faster than the original one. The function only accepts rooted tree.

Author(s)

Chong Wu <chongwu@umn.edu>

References

Chen, Jun, et al (2012). "Associating microbiome composition with environmental covariates using generalized UniFrac distances." Bioinformatics 28(16):2106-2113.

Examples

data(throat.otu.tab)
data(throat.tree)
data(throat.meta)

groups <- throat.meta$SmokingStatus

# Calculate the UniFracs
unifracs <- GUniFrac(throat.otu.tab, throat.tree)
unifracs

microbiome based sum of powered score (MiSPU)

Description

We propose a class of microbiome based sum of powered score (MiSPU) tests based on a newly defined generalized taxon proportion that combines observed microbial composition information with phylogenetic tree information. Different from the existing methods, a MiSPU test is based on a weighted score of the generalized taxon proportion in a general framework of regression, upweighting more likely to be associated microbial lineages. Our simulations demonstrated that one or more MiSPU tests were more powerful than MiRKAT while correctly controlling type I error rates. An adaptive MiSPU (aMiSPU) test is proposed to combine multiple MiSPU tests with various weights, approximating the most powerful MiSPU for a given scenario, consequently being highly adaptive and high powered across various scenarios.

Usage

MiSPU(y, X, tree, cov = NULL,model = c("gaussian", "binomial"),
 pow = c(2:8, Inf), n.perm = 1000)

Arguments

y

Outcome of interest. It can be a disease indicator; =0 for controls, =1 for cases. Or it can be a quantitative trait. A vector with length n (number of observations).

X

OTU count table, row - n sample, column - q OTU

tree

Rooted phylogenetic tree of R class “phylo”

cov

Covariates. A matrix with dimension n by p (n :number of observation, p : number of covariates).

model

Use "gaussian" for a quantitative trait, and use "binomial" for a binary trait.

pow

The gamma which controls the weight. Larger pow puts more weight on the variables that have larger absolute score.

n.perm

number of permutations or bootstraps.

Value

A list object, including the results for MiSPU_u, MiSPU_w and aMiSPU.

Author(s)

Chong Wu

References

Pan, W., et al.(2014) A powerful and adaptive association test for rare variants, Genetics, 197(4), 1081-95

Chong, W., Pan, W. (2015) An Adaptive Association Test for Microbiome Data, submitted.

Examples

data(throat.otu.tab)
data(throat.tree)
data(throat.meta)

Y.tmp =throat.meta[,3]
Y = rep(0,dim(throat.meta)[1])
Y[Y.tmp=="Smoker"] = 1

cov.tmp = throat.meta[,c(10,12)]
cov = matrix(1,dim(throat.meta)[1],2)
cov[cov.tmp[,1]== "None",1] = 0
cov[cov.tmp[,2]== "Male",2] = 0

start.time = proc.time()
X = as.matrix(throat.otu.tab)

out = MiSPU(Y,X, throat.tree,cov,model =  "binomial", pow = c(2:8, Inf), n.perm = 1000)
out

Finding the correspondence OTUs for each taxa

Description

Finding the descendants for each taxa.

Usage

correspLeaves(tree)

Arguments

tree

Rooted phylogenetic tree of R class “phylo”

Value

A list containing the descendants for each taxa.

Author(s)

Chong Wu

References

Chong, W., Pan, W. (2015) An Adaptive Association Test for Microbiome Data, submitted.

Examples

data(throat.tree)
correspLeaves(throat.tree)

ranking the OTUs

Description

Ranking the importance of each taxa.

Usage

ranking(y, X, tree, cov = NULL,gamma,g.taxon.index,model = "binomial")

Arguments

y

Outcome of interest. It can be a disease indicator; =0 for controls, =1 for cases. Or it can be a quantitative trait. A vector with length n (number of observations).

X

OTU count table, row - n sample, column - q OTU

tree

Rooted phylogenetic tree of R class “phylo”

cov

Covariates. A matrix with dimension n by p (n :number of observation, p : number of covariates).

gamma

The best gamma selected by aMiSPU test.

g.taxon.index

g.taxon.index = 1 stands for weigted generalized taxon proportion; otherwise means unweighted generalized taxon proportion.

model

Use "gaussian" for a quantitative trait, and use "binomial" for a binary trait.

Value

A matrix containing the ranking score, the higher the more important.

Author(s)

Chong Wu

References

Chong, W., Pan, W. (2015) An Adaptive Association Test for Microbiome Data, submitted.

Examples

data(throat.otu.tab)
data(throat.tree)
data(throat.meta)

Y.tmp =throat.meta[,3]
Y = rep(0,dim(throat.meta)[1])
Y[Y.tmp=="Smoker"] = 1

cov.tmp = throat.meta[,c(10,12)]
cov = matrix(1,dim(throat.meta)[1],2)
cov[cov.tmp[,1]== "None",1] = 0
cov[cov.tmp[,2]== "Male",2] = 0

start.time = proc.time()
X = as.matrix(throat.otu.tab)

#out = MiSPU(Y,X, throat.tree,cov,model =  "binomial", pow = c(2:8, Inf), n.perm = 1000)
out = ranking(Y,X, throat.tree,cov,gamma = 2, g.taxon.index =1)

OTU counts simulation

Description

We used a phylogenetic tree of OTUs from a real throat microbiome data set, which consists of 856 OTUs after discarding singleton OTUs. Then based on a complicate statistical meodel, we generated the OTU counts for each individual to simulate the feature of real microbiome data.

Usage

simulateData(nSam=100, s=12, ncluster = 20, mu = 1000, size = 25)

Arguments

nSam

Sample size, the default value is 100.

s

The indicator of informative cluster.

ncluster

Positive integer specifying the number of clusters of they phylogenetic tree.

mu

The mean of the negative binomial distribution.

size

The size of the negative binomial distribution

Value

A list object, including the informative OTU counts table and whole OTU counts table.

Author(s)

Chong Wu

References

Chong, W., Pan, W. (2015) An Adaptive Association Test for Microbiome Data, submitted.

Examples

OTU = simulateData()
OTU

Meta data of the throat microbiome samples.

Description

It is part of a microbiome data set for studying the effect of smoking on the upper respiratory tract microbiome. The original data set contains samples from both throat and nose microbiomes, and from both body sides. This data set comes from the throat microbiome of left body side. It contains 60 subjects consisting of 32 nonsmokers and 28 smokers.

Usage

data(throat.meta)

Format

The format is: chr "throat.meta"

Source

Charlson ES, Chen J, Custers-Allen R, Bittinger K, Li H, et al. (2010) Disordered Microbial Communities in the Upper Respiratory Tract of Cigarette Smokers. PLoS ONE 5(12): e15216.

Examples

data(throat.meta)
## maybe str(throat.meta) ; plot(throat.meta) ...

OTU count table from 16S sequencing of the throat microbiome samples.

Description

Usage

data(throat.otu.tab)

Format

The format is: chr "throat.otu.tab"

Details

The OTU table is produced by the QIIME software. Singleton OTUs have been discarded.

Source

Charlson ES, Chen J, Custers-Allen R, Bittinger K, Li H, et al. (2010) Disordered Microbial Communities in the Upper Respiratory Tract of Cigarette Smokers. PLoS ONE 5(12): e15216.

Examples

data(throat.otu.tab)
## maybe str(throat.otu.tab) ; plot(throat.otu.tab) ...

UPGMA tree of the OTUs from 16S sequencing of the throat microbiome samples.

Description

The OTU tree is constructed using UPGMA on the K80 distance matrice of the OTUs. It is a rooted tree of class "phylo".

Usage

data(throat.tree)

Format

The format is: chr "throat.tree"

Details

The OTUs are produced by the QIIME software. Singleton OTUs have been discarded.

Source

Charlson ES, Chen J, Custers-Allen R, Bittinger K, Li H, et al. (2010) Disordered Microbial Communities in the Upper Respiratory Tract of Cigarette Smokers. PLoS ONE 5(12): e15216.

Examples

data(throat.tree)
## maybe str(throat.tree) ; plot(throat.tree) ...