--- title: "A beginner's guide to creating a bulkAnalyseR app from a GEO dataset" output: rmarkdown::html_vignette: vignette: > %\VignetteIndexEntry{A beginner's guide to creating a bulkAnalyseR app from a GEO dataset} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} ---
```{r options, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "##>" ) Sys.setenv("VROOM_CONNECTION_SIZE" = 1e6) ``` In this short tutorial we showcase a simple pipeline to create a bulkAnalyseR app using a publicly available dataset from the [Gene Expression Omnibus (GEO)](https://www.ncbi.nlm.nih.gov/geo/). No pre-requisites are required, as the installation of bulkAnalyseR and download of the data are included. The example app described in this vignette can be found [here](https://bioinf.stemcells.cam.ac.uk/shiny/bulkAnalyseR/GEO/). ## Installation First, install the latest version of bulkAnalyseR, starting with the CRAN and Bioconductor dependencies: ```{r cran_install, eval = FALSE} packages.cran <- c( "ggplot2", "shiny", "shinythemes", "gprofiler2", "stats", "ggrepel", "utils", "RColorBrewer", "circlize", "shinyWidgets", "shinyjqui", "dplyr", "magrittr", "ggforce", "rlang", "glue", "matrixStats", "noisyr", "tibble", "ggnewscale", "ggrastr", "visNetwork", "shinyLP", "grid", "DT", "scales", "shinyjs", "tidyr", "UpSetR", "ggVennDiagram" ) new.packages.cran <- packages.cran[!(packages.cran %in% installed.packages()[, "Package"])] if(length(new.packages.cran)) install.packages(new.packages.cran) packages.bioc <- c( "edgeR", "DESeq2", "preprocessCore", "GENIE3", "ComplexHeatmap" ) new.packages.bioc <- packages.bioc[!(packages.bioc %in% installed.packages()[,"Package"])] if(length(new.packages.bioc)){ if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(new.packages.bioc) } install.packages("bulkAnalyseR") ``` ## Download data and create app ### Get the expression matrix We start by downloading and reading in the expression matrix. Rows represent genes/features and columns represent samples (note you need an internet connection to run the code below). The matrix is from [a 2022 study](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE178620) on the Stem Cell transcriptional response to Microglia-Conditioned Media. We only use a few samples in the study for illustrative purposes. ```{r read} download_path <- paste0(tempdir(), "expression_matrix.csv.gz") download.file( "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE178620&format=file&file=GSE178620%5Fraw%5Fabundances%2Ecsv%2Egz", download_path ) exp <- as.matrix(read.csv(download_path, row.names = 1))[, c(1,2,19,20)] head(exp) ``` ```{r clean up, include = FALSE} file.remove(download_path) ``` ### Defining metadata We use a very simple metadata table with just the main condition in the experiment. Detailed metadata is available for all GEO datasets and can be downloaded and used instead. ```{r meta} meta <- data.frame( name = colnames(exp), condition = sapply(colnames(exp), USE.NAMES = FALSE, function(nm){ strsplit(nm, "_")[[1]][1] }) ) meta ``` ### Pre-processing We can now denoise and normalise the data using bulkAnalyseR ```{r preprocess,fig.width=7, fig.height=5} exp.proc <- bulkAnalyseR::preprocessExpressionMatrix(exp, output.plot = TRUE) ``` ### Creating the shiny app Finally, we can create a shiny app. This example app can be found [here](https://bioinf.stemcells.cam.ac.uk/shiny/bulkAnalyseR/GEO/). ```{r generate app, eval=FALSE} bulkAnalyseR::generateShinyApp( shiny.dir = "shiny_GEO", app.title = "Shiny app for visualisation of GEO data", modality = "RNA", expression.matrix = exp.proc, metadata = meta, organism = "hsapiens", org.db = "org.Hs.eg.db" ) ``` ```{r sessionInfo} sessionInfo() ```