Version: | 1.3 |
Date: | 2025-06-04 |
Title: | Automate the Mapping Between a List of Genes and Gene Ontology Categories |
Maintainer: | Barry Zeeberg <barryz2013@gmail.com> |
Author: | Barry Zeeberg [aut, cre] |
Depends: | R (≥ 4.2.0) |
Imports: | minimalistGODB, HGNChelper, randomGODB, stats, gplots, grDevices, utils, vprint |
LazyData: | true |
LazyDataCompression: | xz |
Description: | In gene-expression microarray studies, for example, one generally obtains a list of dozens or hundreds of genes that differ in expression between samples and then asks 'What does all of this mean biologically?' Alternatively, gene lists can be derived conceptually in addition to experimentally. For instance, one might want to analyze a group of genes known as housekeeping genes. The work of the Gene Ontology (GO) Consortium <geneontology.org> provides a way to address that question. GO organizes genes into hierarchical categories based on biological process, molecular function and subcellular localization. The role of 'GoMiner' is to automate the mapping between a list of genes and GO, and to provide a statistical summary of the results as well as a visualization. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
VignetteBuilder: | knitr |
Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
RoxygenNote: | 7.3.2 |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-06-05 22:39:48 UTC; barryzeeberg |
Repository: | CRAN |
Date/Publication: | 2025-06-05 23:10:02 UTC |
FDR
Description
compute the false discovery rate (FDR) of the hypergeometric p values of genes mapping to gene ontology (GO) categories
Usage
FDR(sampleList, tablePop3, hyper, GOGOA3, nrand, ontology, subd, opt = 0)
Arguments
sampleList |
character vector of user-supplied genes of interest |
tablePop3 |
return value of GOtable3() |
hyper |
return value of GOhypergeometric3() |
GOGOA3 |
return value of subsetGOGOA() |
nrand |
integer number of randomizations |
ontology |
c("molecular_function","cellular_component","biological_process") |
subd |
character string pathname for directory containing sink.txt |
opt |
integer 0:1 parameter used to determine randomization method |
Value
returns a list with FDR information
Examples
## Not run:
# GOGOA3.RData is too large to include in the R package
# so I need to load it from a file that is not in the package.
# Since this is in a file in my own file system, I could not
# include this as a regular example in the package.
# you can generate it using the package 'minimalistGODB'
# or you can retrieve it from https://github.com/barryzee/GO/tree/main/databases
load("/Users/barryzeeberg/personal/GODB_RDATA/goa_human/GOGOA3_goa_human.RData")
fdr<-FDR(x_sampleList1,x_tablePop31,x_hyper1,GOGOA3,3,"biological_process",tempdir(),0)
## End(Not run)
GoMiner data set
Description
GoMiner data set
Usage
data(GOGOA3small)
GOenrich3
Description
compute the gene enrichment in a GO category
Usage
GOenrich3(tableSample3, tablePop3)
Arguments
tableSample3 |
sample return value of GOtable3() |
tablePop3 |
population return value of GOtable3() |
Value
returns a matrix with columns c("SAMPLE","POP","ENRICHMENT")
Examples
m<-GOenrich3(x_tableSample3,x_tablePop3)
GOheatmap
Description
generate a matrix to be used as input to a heat map
Usage
GOheatmap(sampleList, x, thresh, fdrThresh = 0.105, verbose)
Arguments
sampleList |
character list of gene names |
x |
DB component of return value of GOtable3() |
thresh |
output of GOthresh() |
fdrThresh |
numeric value of FDR acceptance threshold |
verbose |
integer vector representing classes |
Value
returns a matrix to be used as input to a heat map
Examples
## Not run:
# GOGOA3.RData is too large to include in the R package
# so I need to load it from a file that is not in the package.
# Since this is in a file in my own file system, I could not
# include this as a regular example in the package.
# you can generate it using the package 'minimalistGODB'
# or you can retrieve it from https://github.com/barryzee/GO/tree/main/databases
load("/Users/barryzeeberg/personal/GODB_RDATA/goa_human/GOGOA3_goa_human.RData")
heatmap<-GOheatmap(cluster52,GOGOA3$ontologies[["biological_process"]],x_thresh,verbose=1)
## End(Not run)
GOhypergeometric
Description
compute the hypergeometric p value for gene enrichment in a GO category
Usage
GOhypergeometric3(tableSample3, tablePop3)
Arguments
tableSample3 |
sample return value of GOtable3() |
tablePop3 |
population return value of GOtable3() |
Value
returns a matrix with columns c("x","m","n","k","p")
Examples
hyper<-GOhypergeometric3(x_tableSample3,x_tablePop3)
GOtable3
Description
tabulate number of geneList mappings to GO categories
Usage
GOtable3(hgncList, DB)
Arguments
hgncList |
character list of gene names |
DB |
selected ontology branch of return value of subsetGOGOA |
Value
returns a list whose components are c("DB","table","ngenes") where 'DB' is the GO DB subsetted to the desired ONTOLOGY, and 'table' is tabulation of number of occurrences of each GO category name within the desired ONTOLOGY, and ngenes is the total number of hgncList genes mapping to GOGOA
Examples
## Not run:
# GOGOA3.RData is too large to include in the R package
# so I need to load it from a file that is not in the package.
# Since this is in a file in my own file system, I could not
# include this as a regular example in the package.
# you can generate it using the package 'minimalistGODB'
# or you can retrieve it from https://github.com/barryzee/GO/tree/main/databases
load("/Users/barryzeeberg/personal/GODB_RDATA/goa_human/GOGOA3_goa_human.RData")
x<-GOtable3(cluster52,GOGOA3$ontologies[["biological_process"]])
## End(Not run)
GOthresh
Description
retrieve lines of m that meet both enrichThresh and countThresh
Usage
GOthresh(m, sampleFDR, enrichThresh, countThresh, pvalThresh, fdrThresh)
Arguments
m |
return value of GOenrich3() |
sampleFDR |
component of return value of RCPD() |
enrichThresh |
numerical acceptance threshold for enrichment |
countThresh |
numerical acceptance threshold for gene count |
pvalThresh |
numerical acceptance threshold for pval |
fdrThresh |
numerical acceptance threshold for fdr |
Value
returns a subset of matrix (m joined with fdr$sampleFDR) with entries meeting all thresholds
Examples
thresh<-GOthresh(x_m,x_fdr$sampleFDR,enrichThresh=2,countThresh=2,pvalThresh=0.1,fdrThresh=0.100)
GoMiner
Description
driver to generate heatmap
Usage
GoMiner(
title = NULL,
dir,
sampleList,
GOGOA3,
ontology,
enrichThresh = 2,
countThresh = 5,
pvalThresh = 0.1,
fdrThresh = 0.1,
nrand = 100,
mn = 2,
mx = 200,
opt,
verbose = 1
)
Arguments
title |
character string descriptive title |
dir |
character string full pathname to the directory acting result repository |
sampleList |
character list of gene names |
GOGOA3 |
return value of subsetGOGOA() |
ontology |
character string c("molecular_function", "cellular_component", "biological_process") |
enrichThresh |
numerical acceptance threshold for enrichment |
countThresh |
numerical acceptance threshold for gene count |
pvalThresh |
numerical acceptance threshold for pval |
fdrThresh |
numerical acceptance threshold for fdr |
nrand |
numeric number of randomizations to compute FDR |
mn |
integer param passed to trimGOGOA3, min size threshold for a category |
mx |
integer param passed to trimGOGOA3, max size threshold for a category |
opt |
integer 0:1 parameter used to select randomization method |
verbose |
integer vector representing classes |
Details
modes of FDR estimation: opt=0 use original database with randomized geneLists opt=1 use original geneList with internally scrambled genes databases (uses randomGODB())
databases that can be used with the real geneList: these are explicitly passed as parameter to GoMiner() (1) original GOGOA3 (2) randomized version of GOGOSA GOGOA3R<-randomGODB(GOGOA3) (3) database containing a subset of the big hitters genes (randomGODB2driver()) attempts to compensate for the over-annotation of some genes, that might lead to false positive if gene G has a lot of mappings to categories, randomly sample G/category pairs to retain a reasonable number of them. e.g., reduce G from 100 category mappings to 7 category mappings, by omitting 93 of the mappings G/category mappings
Value
returns a matrix suitable to generate a heatmap
Examples
## Not run:
# GOGOA3.RData is too large to include in the R package
# so I need to load it from a file that is not in the package.
# Since this is in a file in my own file system, I could not
# include this as a regular example in the package.
# you can generate it using the package 'minimalistGODB'
# or you can retrieve it from https://github.com/barryzee/GO/tree/main/databases
load("/Users/barryzeeberg/personal/GODB_RDATA/goa_human/GOGOA3_goa_human.RData")
l<-GoMiner("Cluster52",tempdir(),cluster52,
GOGOA3=GOGOA3,ontology="biological_process",enrichThresh=2,
countThresh=5,pvalThresh=0.10,fdrThresh=0.10,nrand=2,mn=2,mx=200,opt=0,verbose=1)
# try out yeast database!
load("/Users/barryzeeberg/personal/GODB_RDATA/sgd/GOGOA3_sgd.RData")
# make sure this is in fact the database for the desired species
GOGOA3$species
# use database to find genes mapping to an interesting category
cat<-"GO_0042149__cellular_response_to_glucose_starvation"
w<-which(GOGOA3$ontologies[["biological_process"]][,"GO_NAME"]==cat)
geneList<-GOGOA3$ontologies[["biological_process"]][w,"HGNC"]
l<-GoMiner("YEAST",tempdir(),geneList,
GOGOA3,ontology="biological_process",enrichThresh=2,
countThresh=3,pvalThresh=0.10,fdrThresh=0.10,nrand=2,mn=2,mx=200,opt=0)
## End(Not run)
GoMiner data set
Description
GoMiner data set
Usage
data(HCCS66)
GoMiner data set
Description
GoMiner data set
Usage
data(Housekeeping_Genes)
RCPD
Description
prepare a cpd of p values from randomized gene sets
Usage
RCPD(GOGOA3, tablePop, geneList, nrand, ontology, hyper, subd, opt)
Arguments
GOGOA3 |
return value of subsetGOGOA() |
tablePop |
return value of GOtable3() |
geneList |
character vector lisgt of genes to randomize |
nrand |
integer number of randomizations |
ontology |
c("molecular_function","cellular_component","biological_process") |
hyper |
return value of GOhypergeometric3() from real (nonrandom) data |
subd |
character string pathname for directory containing sink.txt |
opt |
integer 0:1 parameter used to select randomization method |
Details
the cpd of the randomizations is to be used for estimating the false discovery rate (FDR) of the real sampled genes
Value
returns a histogram of log10(p)
Examples
## Not run:
# GOGOA3.RData is too large to include in the R package
# so I need to load it from a file that is not in the package.
# Since this is in a file in my own file system, I could not
# include this as a regular example in the package.
# you can generate it using the package 'minimalistGODB'
# or you can retrieve it from https://github.com/barryzee/GO/tree/main/databases
load("/Users/barryzeeberg/personal/GODB_RDATA/goa_human/GOGOA3_goa_human.RData")
rcpd<-RCPD(GOGOA3,x_tablePop31,10,3,"biological_process",x_hyper1,tempdir(),0)
## End(Not run)
checkGeneListVsDB
Description
determine if gene list and database contain compatible identifiers
Usage
checkGeneListVsDB(geneList, ontology, GOGOA3, thresh = 0.5, verbose = FALSE)
Arguments
geneList |
character list of gene names |
ontology |
character string c("molecular_function", "cellular_component", "biological_process") |
GOGOA3 |
return value of subsetGOGOA() |
thresh |
numeric acceptance threshold for fraction of gene list matching database identifiers |
verbose |
integer vector representing classes |
Value
returns no value, but may have side effect of aborting the computation
Examples
## Not run:
# GOGOA3.RData is too large to include in the R package
# so I need to load it from a file that is not in the package.
# Since this is in a file in my own file system, I could not
# include this as a regular example in the package.
# you can generate it using the package 'minimalistGODB'
# or you can retrieve it from https://github.com/barryzee/GO/tree/main/databases
load("/Users/barryzeeberg/personal/GODB_RDATA/goa_human/GOGOA3_goa_human.RData")
checkGeneListVsDB(geneList=cluster52,ontology="biological_process",
GOGOA3,thresh=0.5,verbose=TRUE)
# supposed to generate error message
load("/Users/barryzeeberg/personal/GODB_RDATA/sgd/GOGOA3_sgd.RData")
checkGeneListVsDB(geneList=xenopusGenes,ontology="biological_process",
GOGOA3,thresh=0.5,verbose=TRUE)
## End(Not run)
GoMiner data set
Description
GoMiner data set
Usage
data(cluster52)
hitterBeforeAfterDriver
Description
driver to invoke hitters2() and trimGOGOA3()
Usage
hitterBeforeAfterDriver(GOGOA3, mn = 20, mx = 200, verbose)
Arguments
GOGOA3 |
return value of minimalistGODB::buildGODatabase() |
mn |
integer minimum category size |
mx |
integer maximum category size |
verbose |
integer vector representing classes |
Value
returns the return value of trimGOGOA3()
Examples
## Not run:
# GOGOA3.RData is too large to include in the R package
# so I need to load it from a file that is not in the package.
# Since this is in a file in my own file system, I could not
# include this as a regular example in the package.
# This example is given in full detail in the package vignette.
# You can generate GOGOA3.RData using the package 'minimalistGODB'
# or you can retrieve it from https://github.com/barryzee/GO
dir<-"/Users/barryzeeberg/personal/GODB_RDATA/goa_human/"
load(sprintf("%s/%s",dir,"GOGOA3_goa_human.RData"))
geneList<-GOGOA3$ontologies[["biological_process"]][1:10,"HGNC"]
GOGOA3tr<-hitterBeforeAfterDriver(GOGOA3,mn=20,mx=200,1)
## End(Not run)
hitters2
Description
determine the number of mappings for the top several genes
Usage
hitters2(GOGOA3, verbose = 1)
Arguments
GOGOA3 |
return value of minimalistGODB::buildGODatabase() |
verbose |
integer vector representing classes |
Value
returns no value, but has side effect of printing information
Examples
## Not run:
# GOGOA3.RData is too large to include in the R package
# so I need to load it from a file that is not in the package.
# Since this is in a file in my own file system, I could not
# include this as a regular example in the package.
# This example is given in full detail in the package vignette.
# You can generate GOGOA3.RData using the package 'minimalistGODB'
# or you can retrieve it from https://github.com/barryzee/GO
dir<-"/Users/barryzeeberg/personal/GODB_RDATA/goa_human/"
load(sprintf("%s/%s",dir,"GOGOA3_goa_human.RData"))
geneList<-GOGOA3$ontologies[["biological_process"]][1:10,"HGNC"]
hitters2(GOGOA3,1)
## End(Not run)
human
Description
determine if database represents human species
Usage
human(GOGOA3, verbose = TRUE)
Arguments
GOGOA3 |
return value of subsetGOGOA() |
verbose |
integer vector representing classes |
Value
returns Boolean TRUE if species is human
Examples
## Not run:
# GOGOA3.RData is too large to include in the R package
# so I need to load it from a file that is not in the package.
# Since this is in a file in my own file system, I could not
# include this as a regular example in the package.
# you can generate it using the package 'minimalistGODB'
# or you can retrieve it from https://github.com/barryzee/GO/tree/main/databases
load("/Users/barryzeeberg/personal/GODB_RDATA/goa_human/GOGOA3_goa_human.RData")
hum<-human(GOGOA3)
load("/Users/barryzeeberg/personal/GODB_RDATA/sgd/GOGOA3_sgd.RData")
hum<-human(XENOPUS,1)
## End(Not run)
preprocessDB
Description
driver to perform several preprocessing steps: quick peek trim small and large categories is the database for human species validate validated HGNC symbols in sampleList determine up to date (ie, contains GOGOA3$species) or legacy version of human database
Usage
preprocessDB(sampleList, GOGOA3, ontology, mn, mx, thresh, verbose)
Arguments
sampleList |
character list of gene names |
GOGOA3 |
return value of subsetGOGOA() |
ontology |
character string c("molecular_function", "cellular_component", "biological_process") |
mn |
integer param passed to trimGOGOA3, min size threshold for a category |
mx |
integer param passed to trimGOGOA3, max size threshold for a category |
thresh |
numerical paramter passed to checkGeneListVsDB() |
verbose |
integer vector representing classes |
Value
returns a list whose components are a trimmed version of GOGOA3 and (for human) a sampleList with validated HGNC symbols
Examples
## Not run:
# GOGOA3.RData is too large to include in the R package
# so I need to load it from a file that is not in the package.
# Since this is in a file in my own file system, I could not
# include this as a regular example in the package.
# you can generate it using the package 'minimalistGODB'
# or you can retrieve it from https://github.com/barryzee/GO/tree/main/databases
load("/Users/barryzeeberg/personal/GODB_RDATA/goa_human/GOGOA3_goa_human.RData")
pp<-preprocessDB(cluster52,GOGOA3,"biological_process",20,200,0.5,3)
## End(Not run)
randSubsetGeneList
Description
retrieve n unique random genes
Usage
randSubsetGeneList(geneList, ngenes)
Arguments
geneList |
character vector geneList |
ngenes |
integer desired number of random genes |
Value
returns a character vector of genes
Examples
## Not run:
# GOGOA3.RData is too large to include in the R package
# so I need to load it from a file that is not in the package.
# Since this is in a file in my own file system, I could not
# include this as a regular example in the package.
# you can generate it using the package 'minimalistGODB'
# or you can retrieve it from https://github.com/barryzee/GO/tree/main/databases
load("/Users/barryzeeberg/personal/GODB_RDATA/goa_human/GOGOA3_goa_human.RData")
genes<-randSubsetGeneList(GOGOA3$genes[["biological_process"]],20)
## End(Not run)
runGoMinerExamples
Description
driver to run GoMiner under several randomization procedures
Usage
runGoMinerExamples(
title = NULL,
dir,
sampleList,
GOGOA3,
ontology,
enrichThresh = 2,
countThresh = 5,
pvalThresh = 0.1,
fdrThresh = 0.1,
nrand = 2,
mn = 2,
mx = 200,
verbose = 1
)
Arguments
title |
character string descriptive title |
dir |
character string full pathname to the directory acting result repository |
sampleList |
character list of gene names |
GOGOA3 |
return value of subsetGOGOA() |
ontology |
character string c("molecular_function", "cellular_component", "biological_process") |
enrichThresh |
numerical acceptance threshold for enrichment |
countThresh |
numerical acceptance threshold for gene count |
pvalThresh |
numerical acceptance threshold for pval |
fdrThresh |
numerical acceptance threshold for fdr |
nrand |
numeric number of randomizations to compute FDR |
mn |
integer param passed to trimGOGOA3, min size threshold for a category |
mx |
integer param passed to trimGOGOA3, max size threshold for a category |
verbose |
integer vector representing classes |
Value
returns a list containing the return value of GoMiner()
Examples
## Not run:
# GOGOA3.RData is too large to include in the R package
# so I need to load it from a file that is not in the package.
# Since this is in a file in my own file system, I could not
# include this as a regular example in the package.
# you can generate it using the package 'minimalistGODB'
# or you can retrieve it from https://github.com/barryzee/GO/tree/main/databases
load("/Users/barryzeeberg/personal/GODB_RDATA/goa_human/GOGOA3_goa_human.RData")
ontology<-"biological_process"
t<-sort(table(GOGOA3$ontologies[[ontology]][,"HGNC"]),decreasing=TRUE)
dir<-tempdir()
sampleList<-names(t)[1:50]
title<-"hi_hitters"
hh<-runGoMinerExamples(title,dir,sampleList,GOGOA3,ontology,nrand=5)
sampleList<-names(t)[1001:1050]
title<-"hi_hitters5"
hh<-runGoMinerExamples(title,dir,sampleList,GOGOA3,ontology,nrand=5)
sampleList<-cluster52
title<-"cluster52"
hh<-runGoMinerExamples(title,dir,sampleList,GOGOA3,ontology,nrand=5)
## End(Not run)
trimGOGOA3
Description
remove categories from GOGOA3 that are too small or too large
Usage
trimGOGOA3(GOGOA3, mn, mx, verbose)
Arguments
GOGOA3 |
return value of subsetGOGOA() |
mn |
integer min size threshold for a category |
mx |
integer max size threshold for a category |
verbose |
integer vector representing classes |
Details
If a category is too small, it is unreliable for statistical evaluation Also, in the extreme case of size = 1, then that category is essentially equivalent to a gene rather than a category. Same is partially true for size = 2. If a category is too large, it is too generic to be useful for categorization. Finally, by trimming the database, analyses will run faster.
Value
returns trimmed version of GOGOA3
Examples
## Not run:
# GOGOA3.RData is too large to include in the R package
# so I need to load it from a file that is not in the package.
# Since this is in a file in my own file system, I could not
# include this as a regular example in the package.
# This example is given in full detail in the package vignette.
# You can generate GOGOA3.RData using the package 'minimalistGODB'
# or you can retrieve it from https://github.com/barryzee/GO/tree/main/databases
GOGO3tr<-trimGOGOA3(GOGOA3,mn=2,mx=200,1)
## End(Not run)
validHGNCSymbols
Description
convert outdated HGNC symbols to current HGNC symbols
Usage
validHGNCSymbols(geneList)
Arguments
geneList |
character vector of HGNC symbols |
Details
removes NA and /// from output of checkGeneSymbols()
Value
returns list of mapping table and vector of current HGNC symbols
Examples
geneList<-c("FN1", "tp53", "UNKNOWNGENE","7-Sep",
"9/7", "1-Mar", "Oct4", "4-Oct","OCT4-PG4", "C19ORF71",
"C19orf71")
l<-validHGNCSymbols(geneList)
GoMiner data set
Description
GoMiner data set
Usage
data(x_fdr)
GoMiner data set
Description
GoMiner data set
Usage
data(x_hyper1)
GoMiner data set
Description
GoMiner data set
Usage
data(x_m)
GoMiner data set
Description
GoMiner data set
Usage
data(x_sampleList1)
GoMiner data set
Description
GoMiner data set
Usage
data(x_tablePop3)
GoMiner data set
Description
GoMiner data set
Usage
data(x_tablePop31)
GoMiner data set
Description
GoMiner data set
Usage
data(x_tableSample3)
GoMiner data set
Description
GoMiner data set
Usage
data(x_thresh)