---
title: "A beginner's guide to creating a bulkAnalyseR app from a GEO dataset"
output:
rmarkdown::html_vignette:
vignette: >
%\VignetteIndexEntry{A beginner's guide to creating a bulkAnalyseR app from a GEO dataset}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r options, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "##>"
)
Sys.setenv("VROOM_CONNECTION_SIZE" = 1e6)
```
In this short tutorial we showcase a simple pipeline to create a bulkAnalyseR app using a publicly available dataset from the [Gene Expression Omnibus (GEO)](https://www.ncbi.nlm.nih.gov/geo/). No pre-requisites are required, as the installation of bulkAnalyseR and download of the data are included.
The example app described in this vignette can be found [here](https://bioinf.stemcells.cam.ac.uk/shiny/bulkAnalyseR/GEO/).
## Installation
First, install the latest version of bulkAnalyseR, starting with the CRAN and Bioconductor dependencies:
```{r cran_install, eval = FALSE}
packages.cran <- c(
"ggplot2", "shiny", "shinythemes", "gprofiler2", "stats", "ggrepel",
"utils", "RColorBrewer", "circlize", "shinyWidgets", "shinyjqui",
"dplyr", "magrittr", "ggforce", "rlang", "glue", "matrixStats",
"noisyr", "tibble", "ggnewscale", "ggrastr", "visNetwork", "shinyLP",
"grid", "DT", "scales", "shinyjs", "tidyr", "UpSetR", "ggVennDiagram"
)
new.packages.cran <- packages.cran[!(packages.cran %in% installed.packages()[, "Package"])]
if(length(new.packages.cran))
install.packages(new.packages.cran)
packages.bioc <- c(
"edgeR", "DESeq2", "preprocessCore", "GENIE3", "ComplexHeatmap"
)
new.packages.bioc <- packages.bioc[!(packages.bioc %in% installed.packages()[,"Package"])]
if(length(new.packages.bioc)){
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(new.packages.bioc)
}
install.packages("bulkAnalyseR")
```
## Download data and create app
### Get the expression matrix
We start by downloading and reading in the expression matrix. Rows represent genes/features and columns represent samples (note you need an internet connection to run the code below). The matrix is from [a 2022 study](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE178620) on the Stem Cell transcriptional response to Microglia-Conditioned Media. We only use a few samples in the study for illustrative purposes.
```{r read}
download_path <- paste0(tempdir(), "expression_matrix.csv.gz")
download.file(
"https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE178620&format=file&file=GSE178620%5Fraw%5Fabundances%2Ecsv%2Egz",
download_path
)
exp <- as.matrix(read.csv(download_path, row.names = 1))[, c(1,2,19,20)]
head(exp)
```
```{r clean up, include = FALSE}
file.remove(download_path)
```
### Defining metadata
We use a very simple metadata table with just the main condition in the experiment. Detailed metadata is available for all GEO datasets and can be downloaded and used instead.
```{r meta}
meta <- data.frame(
name = colnames(exp),
condition = sapply(colnames(exp), USE.NAMES = FALSE, function(nm){
strsplit(nm, "_")[[1]][1]
})
)
meta
```
### Pre-processing
We can now denoise and normalise the data using bulkAnalyseR
```{r preprocess,fig.width=7, fig.height=5}
exp.proc <- bulkAnalyseR::preprocessExpressionMatrix(exp, output.plot = TRUE)
```
### Creating the shiny app
Finally, we can create a shiny app. This example app can be found [here](https://bioinf.stemcells.cam.ac.uk/shiny/bulkAnalyseR/GEO/).
```{r generate app, eval=FALSE}
bulkAnalyseR::generateShinyApp(
shiny.dir = "shiny_GEO",
app.title = "Shiny app for visualisation of GEO data",
modality = "RNA",
expression.matrix = exp.proc,
metadata = meta,
organism = "hsapiens",
org.db = "org.Hs.eg.db"
)
```
```{r sessionInfo}
sessionInfo()
```