Title: | Inference of Causal Links Between a Network and an External Variable |
Version: | 0.1.1 |
Description: | The 'NetCoupler' algorithm identifies potential direct effects of correlated, high-dimensional variables formed as a network with an external variable. The external variable may act as the dependent/response variable or as an independent/predictor variable to the network. |
License: | MIT + file LICENSE |
URL: | https://github.com/NetCoupler/NetCoupler, https://netcoupler.github.io/NetCoupler/ |
BugReports: | https://github.com/NetCoupler/NetCoupler/issues |
Depends: | R (≥ 3.5.0) |
Imports: | checkmate, dplyr, ids, igraph, lifecycle, magrittr, pcalg, ppcor, purrr, rlang (≥ 0.4.6), stats, tibble, tidyselect, utils, tidygraph |
Suggests: | broom, furrr, knitr, rmarkdown, spelling, testthat (≥ 2.1.0) |
VignetteBuilder: | knitr |
RdMacros: | lifecycle |
ByteCompile: | true |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Language: | en-US |
NeedsCompilation: | no |
Packaged: | 2025-05-20 08:30:18 UTC; luke |
Author: | Luke Johnston |
Maintainer: | Luke Johnston <lwjohnst@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-05-20 10:40:02 UTC |
NetCoupler: Inference of Causal Links Between a Network and an External Variable
Description
The 'NetCoupler' algorithm identifies potential direct effects of correlated, high-dimensional variables formed as a network with an external variable. The external variable may act as the dependent/response variable or as an independent/predictor variable to the network.
Author(s)
Maintainer: Luke Johnston lwjohnst@gmail.com (ORCID) [copyright holder]
Authors:
Clemens Wittenbecher Clemens.Wittenbecher@dife.de (ORCID) [copyright holder]
Other contributors:
Fabian Eichelmann Fabian.Eichelmann@dife.de [contributor]
Helena Zacharias helena.zacharias@helmholtz-muenchen.de [contributor]
Daniel Ibsen dbi@ph.au.dk (ORCID) [contributor]
See Also
Useful links:
Report bugs at https://github.com/NetCoupler/NetCoupler/issues
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Convert network graphs to edge tables as tibbles/data.frames.
Description
Usage
as_edge_tbl(network_object)
Arguments
network_object |
Network graph from |
Value
A tibble, with at least two columns:
-
source_node
: The starting node (variable). -
target_node
: The ending node (variable) that links with the source node. -
adjacency_weight
: (Optional) The "weight" given to the edge, which represents the strength of the link between two nodes.
See Also
See nc_estimate_links for examples on using NetCoupler.
Classification options for direct, ambigious, and no effect.
Description
Classification options for direct, ambigious, and no effect.
Usage
classify_options(
single_metabolite_threshold = 0.05,
network_threshold = 0.1,
direct_effect_adjustment = NA
)
Arguments
single_metabolite_threshold , network_threshold , direct_effect_adjustment |
See the |
Value
List with options for the classification.
Compute model estimates between an external (exposure or outcome) variable and a network.
Description
This is the main function that identifies potential links between external factors and the network. There are two functions to estimate and classify links:
-
nc_estimate_exposure_links()
: Computes the model estimates for the exposure side. -
nc_estimate_outcome_links()
: Computes the model estimates for the exposure side.
Usage
nc_estimate_exposure_links(
data,
edge_tbl,
exposure,
adjustment_vars = NA,
model_function,
model_arg_list = NULL,
exponentiate = FALSE,
classify_option_list = classify_options()
)
nc_estimate_outcome_links(
data,
edge_tbl,
outcome,
adjustment_vars = NA,
model_function,
model_arg_list = NULL,
exponentiate = FALSE,
classify_option_list = classify_options()
)
Arguments
data |
The data.frame or tibble that contains the variables of interest, including the variables used to make the network. |
edge_tbl |
Output graph object from |
exposure , outcome |
Character. The exposure or outcome variable of interest. |
adjustment_vars |
Optional. Variables to adjust for in the models. |
model_function |
A function for the model to use (e.g. |
model_arg_list |
Optional. A list containing the named arguments that
will be passed to the model function. A simple example would be
|
exponentiate |
Logical. Whether to exponentiate the log estimates, as computed with e.g. logistic regression models. |
classify_option_list |
A list with classification options for direct, ambigious, or no
effects. Used with the
|
Value
Outputs a tibble that contains the model estimates from either the exposure or outcome side of the network as well as the effect classification. Each row represents the "no neighbour node adjusted" model and has the results for the outcome/exposure to index node pathway. Columns for the outcome are:
-
outcome
orexposure
: The name of the variable used as the external variable. -
index_node
: The name of the metabolite used as the index node from the network. In combination with the outcome/exposure variable, they represent the individual model used for the classification. -
estimate
: The estimate from the outcome/exposure and index node model. -
std_error
: The standard error from the outcome/exposure and index node model. -
fdr_p_value
: The False Discovery Rate-adjusted p-value from the outcome/exposure and index node model. -
effect
: The NetCoupler classified effect between the index node and the outcome/exposure. Effects are classified as "direct" (there is a probable link based on the given thresholds), "ambigious" (there is a potential link but not all thresholds were passed), and "none" (no potential link seen).
The tibble output also has an attribute that contains all the models
generated before classification. Access it with attr(output, "all_models_df")
.
See Also
vignette("examples")
article has more
details on how to use NetCoupler with different models.
Examples
standardized_data <- simulated_data %>%
nc_standardize(starts_with("metabolite"))
metabolite_network <- simulated_data %>%
nc_standardize(starts_with("metabolite"),
regressed_on = "age") %>%
nc_estimate_network(starts_with("metabolite"))
edge_table <- as_edge_tbl(metabolite_network)
results <- standardized_data %>%
nc_estimate_exposure_links(
edge_tbl = edge_table,
exposure = "exposure",
model_function = lm
)
results
# Get results of all models used prior to classification
Create an estimate of the metabolic network as an undirected graph.
Description
The main NetCoupler network creator.
Uses the input data to estimate the underlying undirected graph.
The default uses the PC algorithm, implemented within NetCoupler
with pc_estimate_undirected_graph()
Defaults to using the PC algorithm to calculate possible edges.
Any missing values in the input data are removed by this function,
since some computations can't handle missingness.
Usage
nc_estimate_network(data, cols = everything(), alpha = 0.01)
Arguments
data |
Data that would form the underlying network. |
cols |
< |
alpha |
The alpha level to use to test whether an edge exists or not. Default is 0.01. |
Value
Outputs a tidygraph::tbl_graph()
with the start and end nodes, as
well as the edge weights.
See Also
See nc_estimate_links for examples on using NetCoupler and pc_estimate_undirected_graph for more details on the PC-algorithm network estimation method.
Standardize the metabolic variables.
Description
Can standardize by either 1) log()
-transforming and then applying scale()
(mean-center and scaled by standard deviation), or 2) if regressed_on
variables are given, then log-transforming, running a linear regression to obtain
the stats::residuals()
, and finally scaled. Use regressed_on
to try to
remove influence of potential confounding.
Usage
nc_standardize(data, cols = everything(), regressed_on = NULL)
Arguments
data |
Data frame. |
cols |
Metabolic variables that will make up the network. |
regressed_on |
Optional. A character vector of variables to regress the metabolic variables on. Use if you want to standardize the metabolic variables on variables that are known to influence them, e.g. sex or age. Calculates the residuals from a linear regression model. |
Value
Outputs a tibble object, with the original metabolic variables now standardized.
See Also
nc_estimate_links for more detailed examples or the vignette("NetCoupler")
.
Examples
# Don't regress on any variable
simulated_data %>%
nc_standardize(starts_with("metabolite_"))
# Extract residuals by regressing on a variable
simulated_data %>%
nc_standardize(starts_with("metabolite_"), "age")
# Works with factors too
simulated_data %>%
dplyr::mutate(Sex = as.factor(sample(rep(c("F", "M"), times = nrow(.) / 2)))) %>%
nc_standardize(starts_with("metabolite_"), c("age", "Sex"))
Estimate the undirected graph of the metabolic data.
Description
Uses the PC-algorithm and is mostly a wrapper around pcalg::skeleton()
.
Usage
pc_estimate_undirected_graph(data, alpha = 0.01)
Arguments
data |
Input numeric data that forms the basis of the underlying graph. |
alpha |
Significance level threshold applied to each test to determine conditional dependence for if an edge exists. |
Details
This function estimates the "skeleton of a DAG", meaning a graph without arrowheads, aka an undirected graph. The default estimation method used is the "PC-stable" method, which estimates the order-independent skeleton of the DAG, meaning the order of the variables given does not impact the results (older versions of the algorithm were order-dependent). The method also assumes no latent variables.
An edge is determined by testing for conditional dependence between two
nodes based on the pcalg::gaussCItest()
. Conditional independence exists
when the nodes have zero partial correlation determined from a p-value based
hypothesis test against the correlation matrix of the data from the nodes.
The estimated edges exists between the start and end nodes when the
start and end variables are conditionally dependent given the subset of
remaining variables.
Value
A pcAlgo
object that contains the DAG skeleton, aka undirected graph.
See Also
The help documentation of pcalg::skeleton()
has more details.
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- tibble
- tidyselect
all_of
,any_of
,contains
,ends_with
,everything
,last_col
,matches
,num_range
,starts_with
Simulated dataset with an underlying Directed Graph structure for the metabolites.
Description
Simulated dataset with an underlying Directed Graph structure for the metabolites.
Usage
simulated_data
Format
The simulated dataset is a tibble with the following variables:
Two outcome variables (
outcome_continuous
andoutcome_binary
) along with survival time (outcome_event_time
) that is used for theoutcome_binary
variableA generic
exposure
variable as continuous12
metabolite_*
variablesAn
age
variable used as a confounder