Type: | Package |
Title: | Predict Cancer Subtypes Based on TCGA Data using Machine Learning Method |
Version: | 1.0.0 |
Author: | Dadong Zhang <dadong.zhang.shared@gmail.com> |
Maintainer: | Dadong Zhang <dadong.zhang.shared@gmail.com> |
Description: | Provide functionality for cancer subtyping using nearest centroids or machine learning methods based on TCGA data. |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.1 |
URL: | https://github.com/DadongZ/OncoSubtype |
BugReports: | https://github.com/DadongZ/OncoSubtype/issues |
Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
VignetteBuilder: | knitr |
LazyDataCompression: | xz |
Imports: | caret, randomForest, methods, e1071, pheatmap, tibble, dplyr, limma, rlang, Rdpack |
RdMacros: | Rdpack |
Depends: | SummarizedExperiment, R (≥ 3.63), |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2024-03-22 16:31:11 UTC; dzhang4 |
Repository: | CRAN |
Date/Publication: | 2024-03-22 17:30:10 UTC |
Plot heatmap of the train set or test set
Description
Plot heatmap of the train set or test set
Usage
PlotHeat(object, set = "test", ...)
Arguments
object |
a SubtypeClass object |
set |
options could be 'test', 'train' or 'both'. Default 'test'. |
... |
Parameters passed to |
Value
a pheatmap object
Examples
## Not run:
library(OncoSubtype)
data <- get_median_centered(example_fpkm)
data <- assays(data)$centered
rownames(data) <- rowData(example_fpkm)$external_gene_name
object <- MLSubtype(data, disease = 'LUSC')
PlotHeat(object, set = 'both', fontsize = 10, show_rownames = FALSE, show_colnames = FALSE)
## End(Not run)
Set the SubtypeClass
Description
Set the SubtypeClass
Value
an object of SubtypeClass with three empty solts
Predict the subtypes of selected cancer type based published papers
Description
Predict the subtypes of selected cancer type based published papers
Usage
centroids_subtype(data, disease = "LUSC")
Arguments
data |
data set to predict the subtypes which is a numeric matrix with row names of features and column names of samples |
disease |
character string of the disease to predict subtypes, currently support 'LUSC', 'LUAD' |
Value
an object of class "SubtypeClass" with four slots: genes used for predictiong, predicted subtypes of samples, a matrix of predicting scores, and the method.
Examples
## Not run:
library(OncoSubtype)
data <- get_median_centered(example_fpkm)
data <- assays(data)$centered
rownames(data) <- rowData(example_fpkm)$external_gene_name
centroids_subtype(data, disease = 'HNSC')
## End(Not run)
example FPKM data
Description
example FPKM data
Usage
example_fpkm
Format
SummarizedExperiment object
select highly variable genes from a expression matrix
Description
select highly variable genes from a expression matrix
Usage
get_hvg(data, top = 1000)
Arguments
data |
a (normalized) matrix with rownames of features and colnames of samples |
top |
number of top highly variable genes to output |
Value
subset with top ranked genes by the variances
Examples
## Not run:
library(OncoSubtype)
data <- get_median_centered(example_fpkm)
data <- assays(data)$centered
get_hvg(data)
## End(Not run)
convert expression matrix to median-centered
Description
convert expression matrix to median-centered
Usage
get_median_centered(data, log2 = TRUE)
Arguments
data |
a numeric matrix or 'S4' object |
log2 |
logical, if 'TRUE' |
Value
median-centered express matrix or an object with new slot "centered"
Examples
## Not run:
get_median_centered(example_fpkm)
## End(Not run)
Predict the subtypes of selected cancer type
Description
Predict the subtypes of selected cancer type
Usage
get_rf_pred(train_set, test_set, method = "rf", seed = NULL)
Arguments
train_set |
training set with rownames of samples, first column named 'mRNA_subtype' and the rest of features and expression values. |
test_set |
test set with rownames of features and colnames of samples. |
method |
character string of the method to use currently support 'rf'. |
seed |
integer seed to use. |
Value
a matrix with column names of subtypes and predicted probabilities.
HNSC predictor centroids
Description
HNSC predictor centroids from https://www.nature.com/articles/nature14129
Usage
hnsc_centroids
Format
A tibble
with 728 features and four subtypes.
Load Dataset from GitHub Repository
Description
Downloads a specified dataset from a GitHub repository if it is not already present in the specified local directory, then loads the dataset into the global environment. This function is designed to help manage package size by storing data externally and loading it on-demand.
Usage
load_dataset_from_github(disease, local_dir = path.expand(getwd()))
Arguments
disease |
A character string specifying the disease, which corresponds
to the name of the dataset to be loaded (e.g., "LUSC"). The function constructs
the filename as |
local_dir |
An optional character string specifying the path to the directory
where datasets should be stored locally. If not provided, defaults to a
subdirectory named |
Value
Invisible NULL. The function is primarily used for its side effect of loading a dataset into the global environment. However, the function itself does not return the dataset directly.
Examples
## Not run:
load_dataset_from_github("LUSC")
## End(Not run)
LUAD predictor centroids
Description
LUAD predictor centroids from Wilkerson (2012)
Usage
luad_centroids
Format
A tibble
with 506 features and three subtypes bronchioid, magnoid, and squamoid.
LUSC predictor centroids
Description
LUSC predictor centroids from Wilkerson (2010)
Usage
lusc_centroids
Format
A tibble
with 208 features and four subtypes: primitive, classical, secretory, and basal.
Predict the subtypes of selected cancer type using machine learning
Description
Predict the subtypes of selected cancer type using machine learning
Usage
ml_subtype(
data,
disease = "LUSC",
method = "rf",
removeBatch = TRUE,
seed = NULL
)
Arguments
data |
data set to predict the subtypes which is a numeric matrix with row names of features and column names of samples |
disease |
character string of the disease to predict subtypes, currently support 'LUSC', 'LUAD', and 'BLCA'. |
method |
character string of the method to use currently support 'rf'. |
removeBatch |
whether do batch effect correction using |
seed |
integer seed to use. |
Value
An object of class "SubtypeClass" with four slots: genes used for predictiong, predicted subtypes of samples, a matrix of predicting scores, and the method.
References
-
Wilkerson MJ, Yin X, Hayes D, et al. (2010). “Lung Squamous Cell Carcinoma mRNA Expression Subtypes Are Reproducible, Clinically Important, and Correspond to Normal Cell Types.” Clin Cancer Res, 16(19), 4864-4875.
-
Wilkerson MJ, Yin X, Hayes D, et al. (2012). “Differential pathogenesis of lung adenocarcinoma subtypes involving sequence mutations, copy number, chromosomal instability, and methylation.” Plos One, 7(5), e36530.
-
Network TCGA (2015). “Comprehensive genomic characterization of head and neck squamous cell carcinomas.” Nature, 517, e36530.
Examples
## Not run:
library(OncoSubtype)
data <- get_median_centered(example_fpkm)
data <- assays(data)$centered
rownames(data) <- rowData(example_fpkm)$external_gene_name
ml_subtype(data, disease = 'LUAD', method = 'rf', seed = 123)
## End(Not run)