Title: | Viral Load and CD4 Lymphocytes Regression Models |
Version: | 1.3.4 |
Description: | Provides a comprehensive framework for building, evaluating, and visualizing regression models for analyzing viral load and CD4 (Cluster of Differentiation 4) lymphocytes data. It leverages the principles of the tidymodels ecosystem of Max Kuhn and Hadley Wickham (2020) https://www.tidymodels.org to offer a user-friendly experience in model development. This package includes functions for data preprocessing, feature engineering, model training, tuning, and evaluation, along with visualization tools to enhance the interpretation of model results. It is specifically designed for researchers in biostatistics, computational biology, and HIV research who aim to perform reproducible and rigorous analyses to gain insights into disease dynamics. The main focus is on improving the understanding of the relationships between viral load, CD4 lymphocytes, and other relevant covariates to contribute to HIV research and the visibility of vulnerable seropositive populations. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Suggests: | earth, nnet, rpart, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
Imports: | baguette, Cubist, dials, dplyr, glmnet, hardhat, kernlab, kknn, magrittr, parsnip, purrr, ranger, recipes, rsample, rules, stats, tidyselect, tune, viraldomain, workflows, workflowsets |
NeedsCompilation: | no |
Packaged: | 2025-05-26 21:54:54 UTC; acua6 |
Author: | Juan Pablo Acuña González
|
Maintainer: | Juan Pablo Acuña González <22253567@uagro.mx> |
Repository: | CRAN |
Date/Publication: | 2025-05-27 09:10:02 UTC |
Select best model
Description
Returns performance metrics for a selected model
Usage
viralmodel(output, modelo)
Arguments
output |
A non-ranked viraltab output |
modelo |
A character value |
Value
A table with a single model hyperparameters
Examples
library(dplyr)
library(magrittr)
library(baguette)
library(kernlab)
library(kknn)
library(ranger)
library(rules)
library(glmnet)
# Define the function to impute values in the undetectable range
set.seed(123)
impute_undetectable <- function(column) {
ifelse(column <= 40,
rexp(sum(column <= 40), rate = 1/13) + 1,
column)
}
# Apply the function to all vl columns using purrr's map_dfc
library(viraldomain)
data("viral", package = "viraldomain")
viral_imputed <- viral %>%
mutate(across(starts_with("vl"), ~impute_undetectable(.x)))
traindata <- viral_imputed
semilla <- 1501
target <- "cd_2022"
viralvars <- c("vl_2019", "vl_2021", "vl_2022")
logbase <- 10
pliegues <- 2
repeticiones <- 1
rejilla <- 1
modelo <- "simple_rf"
set.seed(123)
viraltab(traindata, semilla, target, viralvars, logbase, pliegues,
repeticiones, rejilla, rank_output = FALSE) %>% viralmodel(modelo)
Predict Viral Load or CD4 Count using Many Models
Description
This function predicts viral load or CD4 count values based on multiple machine learning models using cross-validation. It allows users to specify two types of predictions: normal predictions on the full dataset or observation-by-observation (obs-by-obs) predictions.
Usage
viralpreds(output, semilla, data, prediction_type = "full")
Arguments
output |
A non-ranked viraltab output |
semilla |
An integer specifying the seed for random number generation to ensure reproducibility. |
data |
A data frame containing the predictors and the target variable. |
prediction_type |
A character string specifying the type of predictions to perform.
Use |
Value
A list containing two elements: predictions
(a vector of predicted values for the target variable)
and RMSE
(the root mean square error of the best model).
Examples
library(dplyr)
library(magrittr)
library(baguette)
library(kernlab)
library(kknn)
library(ranger)
library(rules)
library(glmnet)
# Define the function to impute values in the undetectable range
set.seed(123)
impute_undetectable <- function(column) {
ifelse(column <= 40,
rexp(sum(column <= 40), rate = 1/13) + 1,
column)
}
# Apply the function to all vl columns using purrr's map_dfc
library(viraldomain)
data("viral", package = "viraldomain")
viral_imputed <- viral %>%
mutate(across(starts_with("vl"), ~impute_undetectable(.x)))
traindata <- viral_imputed
target <- "cd_2022"
viralvars <- c("vl_2019", "vl_2021", "vl_2022")
logbase <- 10
pliegues <- 5
repeticiones <- 2
rejilla <- 2
semilla <- 123
viraltab(traindata, semilla, target, viralvars, logbase, pliegues,
repeticiones, rejilla, rank_output = FALSE) %>%
viralpreds(semilla, traindata, prediction_type = "full")
Competing models table
Description
Trains and optimizes a series of regression models for viral load or CD4 counts
Usage
viraltab(
traindata,
semilla,
target,
viralvars,
logbase,
pliegues,
repeticiones,
rejilla,
rank_output = TRUE
)
Arguments
traindata |
A data frame |
semilla |
A numeric value |
target |
A character value |
viralvars |
Vector of variable names related to viral data. |
logbase |
The base for logarithmic transformations. |
pliegues |
A numeric value |
repeticiones |
A numeric value |
rejilla |
A numeric value |
rank_output |
Logical value. If TRUE, returns ranked output; if FALSE, returns unranked output. |
Value
A table of competing models
Examples
library(dplyr)
library(magrittr)
library(baguette)
library(kernlab)
library(kknn)
library(ranger)
library(rules)
library(glmnet)
# Define the function to impute values in the undetectable range
impute_undetectable <- function(column) {
set.seed(123)
ifelse(column <= 40,
rexp(sum(column <= 40), rate = 1/13) + 1,
column)
}
library(viraldomain)
data("viral", package = "viraldomain")
viral_imputed <- viral %>%
mutate(across(starts_with("vl"), ~impute_undetectable(.x)))
traindata <- viral_imputed
semilla <- 1501
target <- "cd_2022"
viralvars <- c("vl_2019", "vl_2021", "vl_2022")
logbase <- 10
pliegues <- 2
repeticiones <- 1
rejilla <- 1
set.seed(123)
viraltab(traindata, semilla, target, viralvars, logbase, pliegues,
repeticiones, rejilla, rank_output = TRUE)
Competing models plot
Description
Plots the rankings of a series of regression models for viral load or CD4 counts
Usage
viralvis(output)
Arguments
output |
A non-ranked viraltab output |
Value
A plot of ranking models
Examples
library(dplyr)
library(magrittr)
library(baguette)
library(kernlab)
library(kknn)
library(ranger)
library(rules)
library(glmnet)
# Define the function to impute values in the undetectable range
set.seed(123)
impute_undetectable <- function(column) {
ifelse(column <= 40,
rexp(sum(column <= 40), rate = 1/13) + 1,
column)
}
# Apply the function to all vl columns using purrr's map_dfc
library(viraldomain)
data("viral", package = "viraldomain")
viral_imputed <- viral %>%
mutate(across(starts_with("vl"), ~impute_undetectable(.x)))
traindata <- viral_imputed
semilla <- 1501
target <- "cd_2022"
viralvars <- c("vl_2019", "vl_2021", "vl_2022")
logbase <- 10
pliegues <- 2
repeticiones <- 1
rejilla <- 1
set.seed(123)
viraltab(traindata, semilla, target, viralvars, logbase, pliegues,
repeticiones, rejilla, rank_output = FALSE) %>% viralvis()