Type: | Package |
Title: | Gene-Ranking Analysis of Pathway Expression |
Version: | 0.1.1 |
Author: | Michael Klein <michael.klein@yale.edu> |
Maintainer: | Michael Klein <michael.klein@yale.edu> |
Imports: | stats |
Description: | Gene-Ranking Analysis of Pathway Expression (GRAPE) is a tool for summarizing the consensus behavior of biological pathways in the form of a template, and for quantifying the extent to which individual samples deviate from the template. GRAPE templates are based only on the relative rankings of the genes within the pathway and can be used for classification of tissue types or disease subtypes. GRAPE can be used to represent gene-expression samples as vectors of pathway scores, where each pathway score indicates the departure from a given collection of reference samples. The resulting pathway- space representation can be used as the feature set for various applications, including survival analysis and drug-response prediction. Users of GRAPE should use the following citation: Klein MI, Stern DF, and Zhao H. GRAPE: A pathway template method to characterize tissue-specific functionality from gene expression profiles. BMC Bioinformatics, 18:317 (June 2017). |
License: | GPL-2 |
LazyData: | TRUE |
RoxygenNote: | 6.1.1 |
NeedsCompilation: | no |
Packaged: | 2019-05-07 17:44:01 UTC; michaelklein |
Repository: | CRAN |
Date/Publication: | 2019-05-07 22:12:24 UTC |
Calculate Pathway Scores
Description
Calculate pathway scores of a single pathway of a set of samples relative to a reference set of samples
Usage
getPathwayScores(refmat, newmat, w = w_quad)
Arguments
refmat |
Pathway expression matrix of reference samples. Rows are genes, columns are samples. |
newmat |
Pathway expression matrix of new samples. Rows are genes, columns are samples. |
w |
Weight function. Default is quadratic weight function. |
Value
Vector of pathway scores of each sample in newmat.
Examples
## Toy example: 50 reference samples
set.seed(10);
refmat <- matrix(rnorm(5*50),nrow=5,ncol=50); rownames(refmat) <- paste0("g",1:5)
### make g2 and g5 larger in refmat
refmat[2,] <- rnorm(50,3,2); refmat[5,] <- rnorm(50,4,4)
### 15 new samples
newmat <- matrix(rnorm(5*15),nrow=5,ncol=15); rownames(newmat) <- paste0("g",1:5)
### make g2 and g3 larger in newmat
newmat[2,] <- rnorm(15,2,3); newmat[3,] <- rnorm(15,4,3)
ps_new <- getPathwayScores(refmat,newmat) ### get pathway scores of new samples
ps_ref <- getPathwayScores(refmat,refmat) ### get pathway scores of reference samples
ps_both <- getPathwayScores(refmat,cbind(refmat,newmat)) ### get pathway scores of both
# > ps_new
# [1] 6.2720 8.5696 9.9904 6.9056 3.7824 8.9344 13.0880 10.2912 3.7824
# 0.0384 13.1136 6.8032 4.8512 12.8512 10.2912
Make binary template and probability template
Description
Takes in matrix, where columns are samples and rows are pathway genes, outputs the binary and probability templates
Usage
makeBinaryTemplateAndProbabilityTemplate(submat)
Arguments
submat |
A matrix where columns are samples and rows are pathway genes |
Value
List containing binary template vector and probability template vector
Examples
submat <- cbind(c(1,3,2,1.5),c(2,3,1.5,1.2),c(1.4,4.2,3.5,3.8))
rownames(submat) <- c("gene_A","gene_B","gene_C","gene_D")
temp <- makeBinaryTemplateAndProbabilityTemplate(submat)
bt <- temp$binary_template; pt <- temp$probability_template
cbind(bt,pt)
Calculate Pathway Space Matrix
Description
Represents new samples as vectors of pathway scores relative to reference samples
Usage
makeGRAPE_psMat(refge, newge, pathway_list, w = w_quad)
Arguments
refge |
Gene expression matrix of reference samples. Rows are genes, columns are samples. |
newge |
Gene expression matrix of new samples. Rows are genes, columns are samples. |
pathway_list |
List of pathways. Each pathway is a character vector consisting of gene names. |
w |
Weight function. Default is quadratic weight function. |
Value
Vector of pathway scores of each sample in newmat.
Examples
#' ### Make pathway scores mat
set.seed(10)
### 50 reference samples
refge <- matrix(rnorm(10*50),nrow=10,ncol=50); rownames(refge) <- paste0("g",1:10)
refge[c(2,5,8),] <- matrix(rnorm(3*50,mean=2,sd=2))
refge[c(3,4,7),] <- matrix(rnorm(3*50,mean=4,sd=4))
### 6 new samples
newge <- matrix(rnorm(10*6),nrow=10,ncol=6); rownames(newge) <- paste0("g",1:10)
newge[c(2:7),] <- matrix(rnorm(6*6,mean=3,sd=1))
newge[c(1,9),] <- matrix(rnorm(2*6,mean=5,sd=3))
pathway_list <- list(set1=paste0("g",1:4),set2=paste0("g",5:10),set3=paste0("g",c(1,4,8:10)))
psmat <- makeGRAPE_psMat(refge,newge,pathway_list)
# > psmat
# [,1] [,2] [,3] [,4] [,5] [,6]
# set1 2.397426 1.406275 2.516492 2.358809 2.555109 2.358809
# set2 0.670354 3.245575 3.962389 2.670354 1.741150 1.579646
# set3 1.536017 2.167373 2.167373 2.167373 2.148305 1.809322
Make pairwise order representation of a sample
Description
Takes in a vector of gene expression values and returns a binary vector consisting of the pairwise rankings for the sample
Usage
makePairwiseOrder(samp)
Arguments
samp |
A vector of gene expression values |
Value
Binary vector of the pairwise ranking representation of the samples
Examples
samp <- c(1,3,2,1.5)
makePairwiseOrder(samp)
Make template names from gene names
Description
Takes in vector of pathway gene names, returns names corresponding to the pairwise binary representation
Usage
makePairwiseOrderNames(path_genes)
Arguments
path_genes |
A vector of pathway gene names |
Value
Names for the pairwise representation, of the form "gA<gB"
Examples
path_genes <- c("gene_A","gene_B","gene_C","gene_D")
makePairwiseOrderNames(path_genes)
DIRAC Classification
Description
Classification of a samples according to dirac distances from templates. Usually applied to the gene expression values for a single pathway.
Usage
predictClassDIRAC(trainmat, testmat, train_labels)
Arguments
trainmat |
Matrix of gene expression for set of genes accross training set samples. Each column is a sample. |
testmat |
Matrix of gene expression for set of genes accross test set samples. Each column is a sample. |
train_labels |
Vector of class labels for each sample in the training set. |
Value
Predicted class labels for test set
Examples
# Toy example of two classes
set.seed(10); path_genes <- c("gA","gB","gC","gD"); nsamps <- 50 # Four genes, 50 samples per class
class_one_samps <- matrix(NA,nrow=length(path_genes),ncol=nsamps) # Class 1
rownames(class_one_samps) <- path_genes
class_one_samps[1,] <- rnorm(ncol(class_one_samps),4,2)
class_one_samps[2,] <- rnorm(ncol(class_one_samps),5,4)
class_one_samps[3,] <- rnorm(ncol(class_one_samps),1,1)
class_one_samps[4,] <- rnorm(ncol(class_one_samps),2,1)
class_two_samps <- matrix(NA,nrow=length(path_genes),ncol=nsamps) # Class 2
rownames(class_two_samps) <- path_genes
class_two_samps[1,] <- rnorm(ncol(class_two_samps),2,3)
class_two_samps[2,] <- rnorm(ncol(class_two_samps),5,2)
class_two_samps[3,] <- rnorm(ncol(class_two_samps),1,1)
class_two_samps[4,] <- rnorm(ncol(class_two_samps),0,1)
all_samps <- cbind(class_one_samps,class_two_samps)
labels <- c(rep(1,nsamps),rep(2,nsamps))
testid <- sample.int(100,20)
trainmat <- all_samps[,-testid]
train_labels <- labels[-testid]
testmat <- all_samps[,testid]
test_labels <- labels[testid]
yhat <- predictClassDIRAC(trainmat,testmat,train_labels)
sum(diag(table(test_labels,yhat)))/length(test_labels) # accuracy
# [1] 0.7
GRAPE Classification
Description
Classification of a samples according to grape distances from templates. Usually applied to the gene expression values for a single pathway.
Usage
predictClassGRAPE(trainmat, testmat, train_labels, w = w_quad)
Arguments
trainmat |
Matrix of gene expression for set of genes accross training set samples. Each column is a sample. |
testmat |
Matrix of gene expression for set of genes accross test set samples. Each column is a sample. |
train_labels |
Vector of class labels for each sample in the training set. |
w |
Weight function. Default is quadratic weight function. |
Value
Predicted class labels for test set
Examples
# Toy example of two classes
set.seed(10); path_genes <- c("gA","gB","gC","gD"); nsamps <- 50 # Four genes, 50 samples per class
class_one_samps <- matrix(NA,nrow=length(path_genes),ncol=nsamps) # Class 1
rownames(class_one_samps) <- path_genes
class_one_samps[1,] <- rnorm(ncol(class_one_samps),4,2)
class_one_samps[2,] <- rnorm(ncol(class_one_samps),5,4)
class_one_samps[3,] <- rnorm(ncol(class_one_samps),1,1)
class_one_samps[4,] <- rnorm(ncol(class_one_samps),2,1)
class_two_samps <- matrix(NA,nrow=length(path_genes),ncol=nsamps) # Class 2
rownames(class_two_samps) <- path_genes
class_two_samps[1,] <- rnorm(ncol(class_two_samps),2,3)
class_two_samps[2,] <- rnorm(ncol(class_two_samps),5,2)
class_two_samps[3,] <- rnorm(ncol(class_two_samps),1,1)
class_two_samps[4,] <- rnorm(ncol(class_two_samps),0,1)
all_samps <- cbind(class_one_samps,class_two_samps)
labels <- c(rep(1,nsamps),rep(2,nsamps))
testid <- sample.int(100,20)
trainmat <- all_samps[,-testid]
train_labels <- labels[-testid]
testmat <- all_samps[,testid]
test_labels <- labels[testid]
yhat <- predictClassGRAPE(trainmat,testmat,train_labels,w_quad)
sum(diag(table(test_labels,yhat)))/length(test_labels) # accuracy
# [1] 0.8
PC Classification
Description
Classification of a samples according to euclidean distances from PC templates. Usually applied to the gene expression values for a single pathway.
Usage
predictClassPC(trainmat, testmat, train_labels)
Arguments
trainmat |
Matrix of gene expression for set of genes accross training set samples. Each column is a sample. |
testmat |
Matrix of gene expression for set of genes accross test set samples. Each column is a sample. |
train_labels |
Vector of class labels for each sample in the training set. |
Value
Predicted class labels for test set
Examples
# Toy example of two classes
set.seed(10); path_genes <- c("gA","gB","gC","gD"); nsamps <- 50 # Four genes, 50 samples per class
class_one_samps <- matrix(NA,nrow=length(path_genes),ncol=nsamps) # Class 1
rownames(class_one_samps) <- path_genes
class_one_samps[1,] <- rnorm(ncol(class_one_samps),4,2)
class_one_samps[2,] <- rnorm(ncol(class_one_samps),5,4)
class_one_samps[3,] <- rnorm(ncol(class_one_samps),1,1)
class_one_samps[4,] <- rnorm(ncol(class_one_samps),2,1)
class_two_samps <- matrix(NA,nrow=length(path_genes),ncol=nsamps) # Class 2
rownames(class_two_samps) <- path_genes
class_two_samps[1,] <- rnorm(ncol(class_two_samps),2,3)
class_two_samps[2,] <- rnorm(ncol(class_two_samps),5,2)
class_two_samps[3,] <- rnorm(ncol(class_two_samps),1,1)
class_two_samps[4,] <- rnorm(ncol(class_two_samps),0,1)
all_samps <- cbind(class_one_samps,class_two_samps)
labels <- c(rep(1,nsamps),rep(2,nsamps))
testid <- sample.int(100,20)
trainmat <- all_samps[,-testid]
train_labels <- labels[-testid]
testmat <- all_samps[,testid]
test_labels <- labels[testid]
yhat <- predictClassPC(trainmat,testmat,train_labels)
sum(diag(table(test_labels,yhat)))/length(test_labels) # accuracy
# [1] 0.55
Quadratic weight function
Description
Calculates the weights of all input entries. All entries should take values in [0,1].
Usage
w_quad(x)
Arguments
x |
Any number, vector of matrix. |
Value
Weight of each element
Examples
w_quad(0.95)
w_quad(cbind(c(.7,.8),c(.9,.1)))