Title: | Network Analysis and Causal Inference Through Structural Equation Modeling |
Version: | 1.2.3 |
Date: | 2025-01-20 |
Description: | Estimate networks and causal relationships in complex systems through Structural Equation Modeling. This package also includes functions for importing, weight, manipulate, and fit biological network models within the Structural Equation Modeling framework as outlined in the Supplementary Material of Grassi M, Palluzzi F, Tarantino B (2022) <doi:10.1093/bioinformatics/btac567>. |
URL: | https://github.com/fernandoPalluzzi/SEMgraph |
Depends: | igraph (≥ 1.6.0), lavaan (≥ 0.6-1), R (≥ 4.0) |
Imports: | aspect, boot, corpcor, dagitty, flip, gdata, ggm, glasso, glmnet, graph, mgcv, mvtnorm, pbapply, protoclust, RBGL, Rgraphviz |
License: | GPL-3 |
LazyData: | true |
Encoding: | UTF-8 |
NeedsCompilation: | no |
RoxygenNote: | 7.3.2 |
Maintainer: | Barbara Tarantino <barbara.tarantino01@universitadipavia.it> |
Repository: | CRAN |
Author: | Mario Grassi [aut], Fernando Palluzzi [aut], Barbara Tarantino [cre] |
Packaged: | 2025-01-29 13:27:28 UTC; mario |
Date/Publication: | 2025-01-29 18:50:05 UTC |
Compute the Average Causal Effect (ACE) for a given source-sink pair
Description
Compute total effects as ACEs of variables X
on variables Y in a directed acyclic graph (DAG). The ACE will be estimated
as the path coefficient of X (i.e., theta) in the linear equation
Y ~ X + Z. The set Z is defined as the adjustment (or conditioning) set of
Y over X, applying various adjustement sets. Standard errors (SE),
for each ACE, are computed following the lm
standard procedure
or a bootstrap-based procedure (see boot
for details).
Usage
SEMace(
graph,
data,
group = NULL,
type = "parents",
effect = "all",
method = "BH",
alpha = 0.05,
boot = NULL,
...
)
Arguments
graph |
An igraph object. |
data |
A matrix or data.frame. Rows correspond to subjects, and columns to graph nodes (variables). |
group |
A binary vector. This vector must be as long as the
number of subjects. Each vector element must be 1 for cases and 0
for control subjects. If |
type |
character Conditioning set Z. If "parents" (default) the Pearl's back-door set (Pearl, 1998), "minimal" the dagitty minimal set (Perkovic et al, 2018), or "optimal" the O-set with the smallest asymptotic variance (Witte et al, 2020) are computed. |
effect |
character X to Y effect. If "all" (default) all effects from X to Y, "source2sink" only effects from source X to sink Y, or "direct" only direct effects from X to Y are computed. |
method |
Multiple testing correction method. One of the values
available in |
alpha |
Significance level for ACE selection (by default,
|
boot |
The number of bootstrap samplings enabling bootstrap
computation of ACE standard errors. If |
... |
Currently ignored. |
Value
A data.frame of ACE estimates between network sources and sinks.
Author(s)
Mario Grassi mario.grassi@unipv.it
References
Pearl J (1998). Graphs, Causality, and Structural Equation Models. Sociological Methods & Research, 27(2):226-284. <https://doi.org/10.1177/0049124198027002004>
Perkovic E, Textor J, Kalisch M, Maathuis MH (2018). Complete graphical characterization and construction of adjustment sets in Markov equivalence classes of ancestral graphs. Journal of Machine Learning Research, 18:1-62. <http://jmlr.org/papers/v18/16-319.html>
Witte J, Henckel L, Maathuis MH, Didelez V (2020). On efficient adjustment in causal graphs. Journal of Machine Learning Research, 21:1-45. <http://jmlr.org/papers/v21/20-175.htm>
Examples
# ACE without group, O-set, all effects:
ace1 <- SEMace(graph = sachs$graph, data = log(sachs$pkc),
group = NULL, type = "optimal", effect = "all",
method = "BH", alpha = 0.05, boot = NULL)
print(ace1)
# ACE with group perturbation, Pa-set, direct effects:
ace2 <- SEMace(graph = sachs$graph, data = log(sachs$pkc),
group = sachs$group, type = "parents", effect = "direct",
method = "none", alpha = 0.05, boot = NULL)
print(ace2)
Bow-free covariance search and data de-correlation
Description
SEMbap()
function implements different deconfounding
methods to adjust the data matrix by removing latent sources of confounding
encoded in them. The selected methods are either based on: (i) Bow-free
Acyclic Paths (BAP) search, (ii) LVs proxies as additional source nodes of
the data matrix, Y or (iii) spectral transformation of Y.
Usage
SEMbap(
graph,
data,
group = NULL,
dalgo = "cggm",
method = "BH",
alpha = 0.05,
hcount = "auto",
cmax = Inf,
limit = 200,
verbose = FALSE,
...
)
Arguments
graph |
An igraph object. |
data |
A matrix whith rows corresponding to subjects, and columns to graph nodes (variables). |
group |
A binary vector. This vector must be as long as the
number of subjects. Each vector element must be 1 for cases and 0
for control subjects. If |
dalgo |
Deconfounding method. Four algorithms are available:
|
method |
Multiple testing correction method. One of the values
available in |
alpha |
Significance level for false discovery rate (FDR) used
for d-separation test. This argument is used to
control data de-correlation. A higher |
hcount |
The number of latent (or hidden) variables. By default
|
cmax |
Maximum number of parents set, C. This parameter can be used to perform only those tests where the number of conditioning variables does not exceed the given value. High-dimensional conditional independence tests can be very unreliable. By default, cmax = Inf. |
limit |
An integer value corresponding to the graph size (vcount)
tolerance. Beyond this limit, the precision matrix is estimated by
"glasso" algorithm (FHT, 2008) to reduce the computational burden of the
exaustive BAP search of the |
verbose |
A logical value. If FALSE (default), the processed graphs will not be plotted to screen. |
... |
Currently ignored. |
Details
Missing edges in causal network inference using a directed acyclic
graph (DAG) are frequently hidden by unmeasured confounding variables.
A Bow-free Acyclic Paths (BAP) search is performed with d-separation tests
between all pairs of variables with missing connection in the input DAG,
adding a bidirected edge (i.e., bow-free covariance) to the DAG when there
is an association between them. The d-separation test evaluates if two
variables (Y1, Y2) in a DAG are conditionally independent for a given
conditioning set, C represented in a DAG by the union of the parent sets
of Y1 and Y2 (Shipley, 2000).
A new bow-free covariance is added if there is a significant (Y1, Y2)
association at a significance level alpha
, after multiple testing
correction. The selected covariance between pairs of nodes is interpreted
as the effect of a latent variable (LV) acting on both nodes; i.e., the LV
is an unobserved confounder. BAP-based algorithms adjust (or de-correlate)
the observed data matrix by conditioning out the latent triggers responsible
for the nuisance edges.
For "pc" algorithm the number of hidden proxies, q is determined by a permutation
method. It compares the singular values to what they would be if the variables
were independent, which is estimated by permuting the columns of the data matrix,
Y and selects components if their singular values are larger than those of the
permuted data (for a review see Dobriban, 2020).
While for "glpc" algorithm, q is determined by the number of clusters by
spectral clustering through cluster_leading_eigen
function.
If the input graph is not acyclic, a warning message will be raised, and a
cycle-breaking algorithm will be applied (see graph2dag
for details).
Value
A list of four objects:
"dag", the directed acyclic graph (DAG) extracted from input graph. If (dalgo = "glpc" or "pc"), the DAG also includes LVs as source nodes.
"guu", the bow-free covariance graph, BAP = dag + guu. If (dalgo = "pc" or "trim"), guu is equal to NULL
"adj", the adjacency matrix of selected bow-free covariances; i.e, the missing edges selected after multiple testing correction. If (dalgo = "pc" or "trim"), adj matrix is equal to NULL.
"data", the adjusted (de-correlated) data matrix or if (dalgo = "glpc", or "pc"), the combined data matrix, where the first columns represent LVs scores and the other columns are the raw data.
Author(s)
Mario Grassi mario.grassi@unipv.it
References
Grassi M, Palluzzi F, Tarantino B (2022). SEMgraph: An R Package for Causal Network Analysis of High-Throughput Data with Structural Equation Models. Bioinformatics, 38(20), 4829–4830. <https://doi.org/10.1093/bioinformatics/btac567>
Shipley B (2000). A new inferential test for path models based on DAGs. Structural Equation Modeling, 7(2), 206-218. <https://doi.org/10.1207/S15328007SEM0702_4>
Jiang B, Ding C, Bin L, Tang J (2013). Graph-Laplacian PCA: Closed-Form Solution and Robustness. IEEE Conference on Computer Vision and Pattern Recognition, 3492-3498. <https://doi.org/10.1109/CVPR.2013.448>
Ćevid D, Bühlmann P, Meinshausen N (2020). Spectral deconfounding via perturbed sparse linear models. J. Mach. Learn. Res, 21(232), 1-41. <http://jmlr.org/papers/v21/19-545.html>
Dobriban E (2020). Permuatation methods for Factor Analysis and PCA. Ann. Statist. 48(5): 2824-2847 <https://doi.org/10.1214/19-AOS1907>
Friedman J, Hastie T, Tibshirani R (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432-441. <https://doi.org/10.1093/biostatistics/kxm045>
Examples
#Set function param
graph <- sachs$graph
data <- log(sachs$pkc)
group <-sachs$group
# BAP decounfounding with CGGM (default)
bap <- SEMbap(graph, data, verbose = TRUE)
# SVD decounfounding with trim method
svd <- SEMbap(graph, data, dalgo = "trim")
# Model fitting (with node-perturbation)
sem1 <- SEMrun(graph, data, group)
bap1 <- SEMrun(bap$dag, bap$data, group)
svd1 <- SEMrun(svd$dag, svd$data, group)
Estimate a DAG from an input (or empty) graph
Description
Two-step extraction of the optimal DAG from an input (or empty)
graph, using in step 1) graph topological order or bottom-up search order,
and in step 2) parent recovery with the LASSO-based algorithm (FHT, 2010),
implemented in glmnet
.
Usage
SEMdag(
graph,
data,
LO = "TO",
beta = 0,
eta = 0,
lambdas = NA,
penalty = TRUE,
verbose = FALSE,
...
)
Arguments
graph |
An igraph object or a graph with no edges (make_empty_graph(n=0)). |
data |
A matrix whith n rows corresponding to subjects, and p columns to graph nodes (variables). |
LO |
character for linear order method. If LO="TO" or LO="TL" the
topological order (resp. level) of the input graph is enabled, while LO="BU"
the data-driven bottom-up search of vertex (resp. layer) order is performed
using the vertices of the empty graph. By default |
beta |
Numeric value. Minimum absolute LASSO beta coefficient for
a new direct link to be retained in the final model. By default,
|
eta |
Numeric value. Minimum fixed eta threshold for bottom-up search
of vertex (eta = 0) or layer (eta > 0) ordering. Use eta = NULL, for estimation
of eta adaptively with half of the sample data. By default, |
lambdas |
A vector of regularization LASSO lambda values. If lambdas is
NULL, the |
penalty |
A logical value. Separate penalty factors can be applied to
each coefficient. This is a number that multiplies lambda to allow differential
shrinkage. Can be 0 for some variables, which implies no shrinkage, and that
variable is always included in the model. If TRUE (default) weights are based
on the graph edges: 0 (i.e., edge present) and 1 (i.e., missing edge) ensures
that the input edges will be retained in the final model. If FALSE the
|
verbose |
A logical value. If FALSE (default), the processed graphs will not be plotted to screen. |
... |
Currently ignored. |
Details
The extracted DAG is estimated using the two-step order search approach.
First a vertex (node) or level (layer) order of p nodes is determined, and from
this sort, the DAG can be learned using in step 2) penalized (L1) regressions
(Shojaie and Michailidis, 2010). The estimate linear order are obtained from
a priori graph topological vertex (TO) or level (TL) ordering, or with a
data-driven Bottom-up (BU) approach, assuming a SEM whose error terms have equal
variances (Peters and Bühlmann, 2014). The BU algorithm first estimates the last
element (the terminal vertex) using the diagonal entries of the inverse covariance
matrix with: t = argmin(diag(Omega)), or the terminal layer (> 1 vertices) with
d = diag(Omega)- t < eta. And then, it determines its parents with L1 regression.
After eliminating the last element (or layer) of the ordering, the algorithm applies
the same procedure until a DAG is completely estimated. In high-dimensional data
(n < p), the inverse covariance matrix is computed by glasso-based algorithm
(FHT, 2008), implemented in glasso
. If the input graph is
not acyclic, in TO or TL, a warning message will be raised, and a cycle-breaking
algorithm will be applied (see graph2dag
for details).
Output DAG will be colored: vertices in cyan, if they are source nodes, and in
orange, if they are sink nodes, and edges in gray, if they were present in the
input graph, and in green, if they are new edges generated by LASSO screening.
Value
A list of 3 igraph objects plus the vertex ordering:
"dag", the estimated DAG;
"dag.new", new estimated connections;
"dag.old", connections preserved from the input graph;
"LO", the estimated vertex ordering.
Author(s)
Mario Grassi mario.grassi@unipv.it
References
Friedman J, Hastie T, Tibshirani R (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432-441. <https://doi.org/10.1093/biostatistics/kxm045>
Friedman J, Hastie T, Tibshirani R (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, Vol. 33(1), 1-22. <https://doi.org/10.18637/jss.v033.i01>
Shojaie A, Michailidis G (2010). Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika, 97(3): 519-538. <https://doi.org/10.1093/biomet/asq038>
Jankova J, van de Geer S (2015). Confidence intervals for high-dimensional inverse covariance estimation. Electronic Journal of Statistics, 9(1): 1205-1229. <https://doi.org/10.1214/15-EJS1031>
Peters J, Bühlmann P (2014). Identifiability of Gaussian structural equation models with equal error variances. Biometrika, 101(1):219–228. <https://doi.org/10.1093/biomet/ast043>
See Also
Examples
#Set function param
ig <- sachs$graph
X <- log(sachs$pkc)
group <- sachs$group
# DAG estimation (default values)
dag0 <- SEMdag(ig, X)
sem0 <- SEMrun(ig, X, group)
# Graphs
old.par <- par(no.readonly = TRUE)
par(mfrow=c(2,2), mar=rep(1,4))
plot(sachs$graph, layout=layout.circle, main="input graph")
plot(dag0$dag, layout=layout.circle, main = "Output DAG")
plot(dag0$dag.old, layout=layout.circle, main = "Inferred old edges")
plot(dag0$dag.new, layout=layout.circle, main = "Inferred new edges")
par(old.par)
# Four DAG estimation
dag1 <- SEMdag(ig, X, LO="TO")
dag2 <- SEMdag(ig, X, LO="TL")
dag3 <- SEMdag(ig, X, LO="BU", eta=0)
dag4 <- SEMdag(ig, X, LO="BU", eta=NULL)
unlist(dag1$LO)
dag2$LO
unlist(dag3$LO)
dag4$LO
# Graphs
old.par <- par(no.readonly = TRUE)
par(mfrow=c(2,2), mar=rep(2,4))
gplot(dag1$dag, main="TO")
gplot(dag2$dag, main="TL")
gplot(dag3$dag, main="BU")
gplot(dag4$dag, main="TLBU")
par(old.par)
SEM-based differential network analysis
Description
Creates a sub-network with perturbed edges obtained from the
output of SEMace
, comparable to the procedure in
Jablonski et al (2022), or of SEMrun
with two-group
and CGGM solver, comparable to the algorithm 2 in Belyaeva et al (2021).
To increase the efficiency of computations for large graphs, users can
select to break the network structure into clusters, and select the
topological clustering method (see clusterGraph
).
The function SEMrun
is applied iteratively on
each cluster (with size min > 10 and max < 500) to obtain the graph
with the full list of perturbed edges.
Usage
SEMdci(graph, data, group, type = "ace", method = "BH", alpha = 0.05, ...)
Arguments
graph |
Input network as an igraph object. |
data |
A matrix or data.frame. Rows correspond to subjects, and columns to graph nodes (variables). |
group |
A binary vector. This vector must be as long as the number of subjects. Each vector element must be 1 for cases and 0 for control subjects. |
type |
Average Causal Effect (ACE) with two-group, "parents"
(back-door) adjustement set, and "direct" effects ( |
method |
Multiple testing correction method. One of the values
available in |
alpha |
Significance level (default = 0.05) for edge set selection. |
... |
Currently ignored. |
Value
An igraph object.
Author(s)
Mario Grassi mario.grassi@unipv.it
References
Belyaeva A, Squires C, Uhler C (2021). DCI: learning causal differences between gene regulatory networks. Bioinformatics, 37(18): 3067–3069. <https://doi: 10.1093/bioinformatics/btab167>
Jablonski K, Pirkl M, Ćevid D, Bühlmann P, Beerenwinkel N (2022). Identifying cancer pathway dysregulations using differential causal effects. Bioinformatics, 38(6):1550–1559. <https://doi.org/10.1093/bioinformatics/btab847>
Examples
## Not run:
#load SEMdata package for ALS data with 17K genes:
#devtools::install_github("fernandoPalluzzi/SEMdata")
#library(SEMdata)
# Nonparanormal(npn) transformation
library(huge)
data.npn<- huge.npn(alsData$exprs)
dim(data.npn) #160 17695
# Extract KEGG interactome (max component)
KEGG<- properties(kegg)[[1]]
summary(KEGG)
# KEGG modules with ALS perturbed edges using fast gready clustering
gD<- SEMdci(KEGG, data.npn, alsData$group, type="fgc")
summary(gD)
gcD<- properties(gD)
old.par <- par(no.readonly = TRUE)
par(mfrow=c(2,2), mar=rep(2,4))
gplot(gcD[[1]], l="fdp", main="max component")
gplot(gcD[[2]], l="fdp", main="2nd component")
gplot(gcD[[3]], l="fdp", main="3rd component")
gplot(gcD[[4]], l="fdp", main="4th component")
par(old.par)
## End(Not run)
SEM-based gene set analysis
Description
Gene Set Analysis (GSA) via self-contained test for group
effect on signaling (directed) pathways based on SEM. The core of the
methodology is implemented in the RICF algorithm of SEMrun()
,
recovering from RICF output node-specific group effect p-values, and
Brown’s combined permutation p-values of node activation and inhibition.
Usage
SEMgsa(g = list(), data, group, method = "BH", alpha = 0.05, n_rep = 1000, ...)
Arguments
g |
A list of pathways to be tested. |
data |
A matrix or data.frame. Rows correspond to subjects, and columns to graph nodes (variables). |
group |
A binary vector. This vector must be as long as the number of subjects. Each vector element must be 1 for cases and 0 for control subjects. |
method |
Multiple testing correction method. One of the values
available in |
alpha |
Gene set test significance level (default = 0.05). |
n_rep |
Number of randomization replicates (default = 1000). |
... |
Currently ignored. |
Details
For gaining more biological insights into the functional roles of pre-defined subsets of genes, node perturbation obtained from RICF fitting has been combined with up- or down-regulation of genes from KEGG to obtain overall pathway perturbation as follows:
The node perturbation is defined as activated when the minimum among the p-values is positive; if negative, the status is inhibited.
Up- or down- regulation of genes (derived from KEGG database) has been obtained from the weighted adjacency matrix of each pathway as column sum of weights over each source node. If the overall sum of node weights is below 1, the pathway is flagged as down-regulated otherwise as up-regulated.
The combination between these two quantities allows to define the direction (up or down) of gene perturbation. Up- or down regulated gene status, associated with node inhibition, indicates a decrease in activation (or increase in inhibition) in cases with respect to control group. Conversely, up- or down regulated gene status, associated with node activation, indicates an increase in activation (or decrease in inhibition) in cases with respect to control group.
Value
A list of 2 objects:
"gsa", A data.frame reporting the following information for each pathway in the input list:
"No.nodes", pathway size (number of nodes);
"No.DEGs", number of differential espression genes (DEGs) within the pathway, after multiple test correction with one of the methods available in
p.adjust
;"pert", pathway perturbation status (see details);
"pNA", Brown's combined P-value of pathway node activation;
"pNI", Brown's combined P-value of pathway node inhibition;
"PVAL", Bonferroni combined P-value of pNA, and pNI; i.e., 2* min(pNA, PNI);
"ADJP", Adjusted Bonferroni P-value of pathway perturbation; i.e., min(No.pathways * PVAL; 1).
"DEG", a list with DEGs names per pathways.
Author(s)
Mario Grassi mario.grassi@unipv.it
References
Grassi, M., Tarantino, B. SEMgsa: topology-based pathway enrichment analysis with structural equation models. BMC Bioinformatics 23, 344 (2022). https://doi.org/10.1186/s12859-022-04884-8
Examples
## Not run:
# Nonparanormal(npn) transformation
als.npn <- transformData(alsData$exprs)$data
# Selection of FTD-ALS pathways from kegg.pathways.Rdata
paths.name <- c("MAPK signaling pathway",
"Protein processing in endoplasmic reticulum",
"Endocytosis",
"Wnt signaling pathway",
"Neurotrophin signaling pathway",
"Amyotrophic lateral sclerosis")
j <- which(names(kegg.pathways) %in% paths.name)
GSA <- SEMgsa(kegg.pathways[j], als.npn, alsData$group,
method = "bonferroni", alpha = 0.05,
n_rep = 1000)
GSA$gsa
GSA$DEG
## End(Not run)
Search for directed or shortest paths between pairs of source-sink nodes
Description
Find and fit all directed or shortest paths between two source-sink nodes of a graph.
Usage
SEMpath(graph, data, group, from, to, path, verbose = FALSE, ...)
Arguments
graph |
An igraph object. |
data |
A matrix or data.frame. Rows correspond to subjects, and columns to graph nodes (variables). |
group |
A binary vector. This vector must be as long as the
number of subjects. Each vector element must be 1 for cases and 0
for control subjects. If |
from |
Starting node name (i.e., source node). |
to |
Ending node name (i.e., sink node). |
path |
If |
verbose |
Show the directed (or shortest) path between the given source-sink pair inside the input graph. |
... |
Currently ignored. |
Value
A list of four objects: a fitted model object of class
lavaan
("fit"), aggregated and node-specific
group effect estimates and P-values ("gest"), the extracted subnetwork
as an igraph object ("graph"), and the input graph with a color
attribute mapping the chosen path ("map").
Author(s)
Mario Grassi mario.grassi@unipv.it
Examples
# Directed path fitting
path <- SEMpath(graph = sachs$graph, data = log(sachs$pkc),
group = sachs$group,
from = "PIP3",
to = "Erk",
path = "directed")
# Summaries
summary(path$fit)
print(path$gest)
# Graphs
gplot(path$map, main="path from PiP2 to Erk")
plot(path$map, layout=layout.circle, main="path from PiP2 to Erk")
Fit a graph as a Structural Equation Model (SEM)
Description
SEMrun()
converts a (directed, undirected, or mixed)
graph to a SEM and fits it. If a binary group variable (i.e., case/control)
is present, node-level or edge-level perturbation is evaluated.
This function can handle loop-containing models, although multiple
links between the same two nodes (including self-loops and mutual
interactions) and bows (i.e., a directed and a bidirected link between
two nodes) are not allowed.
Usage
SEMrun(
graph,
data,
group = NULL,
fit = 0,
algo = "lavaan",
start = NULL,
SE = "standard",
n_rep = 1000,
limit = 100,
...
)
Arguments
graph |
An igraph object. |
data |
A matrix whith rows corresponding to subjects, and columns to graph nodes (variables). |
group |
A binary vector. This vector must be as long as the
number of subjects. Each vector element must be 1 for cases and 0
for control subjects. If |
fit |
A numeric value indicating the SEM fitting mode.
If |
algo |
MLE method used for SEM fitting. If |
start |
Starting value of SEM parameters for |
SE |
If "standard" (default), with |
n_rep |
Number of randomization replicates (default = 1000),
for permutation flip or boostrap samples, if |
limit |
An integer value corresponding to the network size
(i.e., number of nodes). Beyond this limit, the execution under
|
... |
Currently ignored. |
Details
SEMrun maps data onto the input graph and converts it into a
SEM. Directed connections (X -> Y) are interpreted as direct causal
effects, while undirected, mutual, and bidirected connections are
converted into model covariances. SEMrun output contains different sets
of parameter estimates. Beta coefficients (i.e., direct effects) are
estimated from directed interactions and residual covariances (psi
coefficients) from bidirected, undirected, or mutual interactions.
If a group variable is given, exogenous group effects on nodes (gamma
coefficients) or edges (delta coefficients) will be estimated.
By default, maximum likelihood parameter estimates and P-values for
parameter sets are computed by conventional z-test (= estimate/SE),
and fits it through the lavaan
function, via
Maximum Likelihood Estimation (estimator = "ML", default estimator in
lavOptions
).
In case of high dimensionality (n.variables >> n.subjects), the covariance
matrix could not be semi-definite positive and thus parameter estimates
could not be done. If this happens, covariance matrix regularization
is enabled using the James-Stein-type shrinkage estimator implemented
in the function pcor.shrink
of corpcor R package.
Argument fit
determines how group influence is evaluated in the
model, as absent (fit = 0
), node perturbation (fit = 1
),
or edge perturbation (fit = 2
). When fit = 1
, the group
is modeled as an exogenous variable, influencing all the other graph
nodes. When fit = 2
, SEMrun estimates the differences
of the beta and/or psi coefficients (network edges) between groups.
This is equivalent to fit a separate model for cases and controls,
as opposed to one common model perturbed by the exogenous group effect.
Once fitted, the two models are then compared to assess significant
edge (i.e., direct effect) differences (d = beta1 - beta0).
P-values for parameter sets are computed by z-test (= d/SE), through
lavaan
. As an alternative to standard P-value
calculation, SEMrun may use either RICF (randomization or bootstrap
P-values) or GGM (de-biased asymptotically normal P-values) methods.
These algorithms are much faster than lavaan
in case of large input graphs.
Value
A list of 5 objects:
"fit", SEM fitted lavaan, ricf, or cggm object, depending on the MLE method specified by the
algo
argument;"gest" or "dest", a data.frame of node-specific ("gest") or edge-specific ("dest") group effect estimates and P-values;
"model", SEM model as a string if
algo = "lavaan"
, andNULL
otherwise;"graph", the induced subgraph of the input network mapped on data variables. Graph edges (i.e., direct effects) with P-value < 0.05 will be highlighted in red (beta > 0) or blue (beta < 0). If a group vector is given, nodes with significant group effect (P-value < 0.05) will be red-shaded (beta > 0) or lightblue-shaded (beta < 0);
"data", input data subset mapping graph nodes, plus group at the first column (if no group is specified, this column will take NA values).
Author(s)
Mario Grassi mario.grassi@unipv.it
References
Pearl J (1998). Graphs, Causality, and Structural Equation Models. Sociological Methods & Research., 27(2):226-284. <https://doi.org/10.1177/0049124198027002004>
Yves Rosseel (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2): 1-36. <https://www.jstatsoft.org/v48/i02/>
Pepe D, Grassi M (2014). Investigating perturbed pathway modules from gene expression data via Structural Equation Models. BMC Bioinformatics, 15: 132. <https://doi.org/10.1186/1471-2105-15-132>
Drton M, Eichler M, Richardson TS (2009). Computing Maximum Likelihood Estimated in Recursive Linear Models with Correlated Errors. Journal of Machine Learning Research, 10(Oct): 2329-2348. <https://www.jmlr.org/papers/volume10/drton09a/drton09a.pdf>
Jankova, J., & Van De Geer, S (2019). Inference in high-dimensional graphical models. In Handbook of Graphical Models (2019). Chapter 14 (sec. 14.2): 325-349. Chapman & Hall/CRC. ISBN: 9780429463976
Hastie T, Tibshirani R, Friedman J. (2009). The Elements of Statistical Learning (2nd ed.). Springer Verlag. ISBN: 978-0-387-84858-7
Grassi M, Palluzzi F, Tarantino B (2022). SEMgraph: An R Package for Causal Network Analysis of High-Throughput Data with Structural Equation Models. Bioinformatics, 38 (20), 4829–4830 <https://doi.org/10.1093/bioinformatics/btac567>
See Also
See fitAncestralGraph
and fitConGraph
for RICF algorithm and constrained GGM algorithm details, respectively.
Examples
#### Model fitting (no group effect)
sem0 <- SEMrun(graph = sachs$graph, data = log(sachs$pkc))
summary(sem0$fit)
head(parameterEstimates(sem0$fit))
# Graphs
gplot(sem0$graph, main = "significant edge weights")
plot(sem0$graph, layout = layout.circle, main = "significant edge weights")
#### Model fitting (common model, group effect on nodes)
sem1 <- SEMrun(graph = sachs$graph, data = log(sachs$pkc),
group = sachs$group)
# Fitting summaries
summary(sem1$fit)
print(sem1$gest)
head(parameterEstimates(sem1$fit))
# Graphs
gplot(sem1$graph, main = "Between group node differences")
plot(sem1$graph, layout = layout.circle, main = "Between group node differences")
#### Two-group model fitting (group effect on edges)
sem2 <- SEMrun(graph = sachs$graph, data = log(sachs$pkc),
group = sachs$group,
fit = 2)
# Summaries
summary(sem2$fit)
print(sem2$dest)
head(parameterEstimates(sem2$fit))
# Graphs
gplot(sem2$graph, main = "Between group edge differences")
plot(sem2$graph, layout = layout.circle, main = "Between group edge differences")
# Fitting and visualization of a large pathway:
g <- kegg.pathways[["Neurotrophin signaling pathway"]]
G <- properties(g)[[1]]
summary(G)
# Nonparanormal(npn) transformation
als.npn <- transformData(alsData$exprs)$data
g1 <- SEMrun(G, als.npn, alsData$group, algo = "cggm")$graph
g2 <- SEMrun(g1, als.npn, alsData$group, fit = 2, algo = "cggm")$graph
# extract the subgraph with node and edge differences
g2 <- g2 - E(g2)[-which(E(g2)$color != "gray50")]
g <- properties(g2)[[1]]
# plot graph
E(g)$color<- E(g2)$color[E(g2) %in% E(g)]
gplot(g, l="fdp", psize=40, main="node and edge group differences")
Tree-based structure learning methods
Description
Four tree-based structure learning methods are implemented with graph and data-driven algorithms.
Usage
SEMtree(
graph,
data,
seed,
type = "ST",
eweight = NULL,
alpha = 0.05,
verbose = FALSE,
...
)
Arguments
graph |
An igraph object. |
data |
A matrix or data.frame. Rows correspond to subjects, and columns to graph nodes (variables). |
seed |
A vector of seed nodes. |
type |
Tree-based structure learning method. Four algorithms are available:
|
eweight |
Edge weight type for igraph object can be externally derived
using
|
alpha |
Threshold for rejecting a pair of node being independent in
"CPDAG" algorithm. The latter implements a natural v-structure identification
procedure by thresholding the pairwise sample correlations over all adjacent
pairs of edges with some appropriate threshold. By default,
|
verbose |
If TRUE, it shows the output tree (not recommended for large graphs). |
... |
Currently ignored. |
Details
A tree ia an acyclic graph with p vertices and p-1 edges. The graph method
refers to the Steiner Tree (ST), a tree from an undirected graph that connect "seed"
with additional nodes in the "most compact" way possible. The data-driven methods
propose fast and scalable procedures based on Chu-Liu–Edmonds’ algorithm (CLE) to
recover a tree from a full graph. The first method, called Causal Additive Trees (CAT)
uses pairwise mutual weights as input for CLE algorithm to recover a directed tree
(an "arborescence"). The second one applies CLE algorithm for skeleton recovery and
extends the skeleton to a tree (a "polytree") represented by a Completed Partially
Directed Acyclic Graph (CPDAG). Finally, the Minimum Spanning Tree (MST) connecting
an undirected graph with minimal edge weights can be identified.
To note, if the input graph is a directed graph, ST and MST undirected trees are
converted in directed trees using the orientEdges
function.
Value
An igraph
object. If type = "ST"
, seed nodes are
colored in "aquamarine" and connectors in "white". If type = "ST"
and
type = "MST"
, edges are colored in "green" if not present in the input,
graph. If type = "CPDAG"
, bidirected edges are colored in "black"
(if the algorithm is not able to establish the direction of the relationship
between x and y).
Author(s)
Mario Grassi mario.grassi@unipv.it
References
Grassi M, Tarantino B (2023). SEMtree: tree-based structure learning methods with structural equation models. Bioinformatics, 39 (6), 4829–4830 <https://doi.org/10.1093/bioinformatics/btad377>
Kou, L., Markowsky, G., Berman, L. (1981). A fast algorithm for Steiner trees. Acta Informatica 15, 141–145. <https://doi.org/10.1007/BF00288961>
Prim, R.C. (1957). Shortest connection networks and some generalizations Bell System Technical Journal, 37 1389–1401.
Chow, C.K. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3):462–467.
Meek, C. (1995). Causal inference and causal explanation with background knowledge. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, 403–410.
Jakobsen, M, Shah, R., Bühlmann, P., Peters, J. (2022). Structure Learning for Directed Trees. arXiv: <https://doi.org/10.48550/arxiv.2108.08871>.
Lou, X., Hu, Y., Li, X. (2022). Linear Polytree Structural Equation Models: Structural Learning and Inverse Correlation Estimation. arXiv: <https://doi.org/10.48550/arxiv.2107.10955>
Examples
# Nonparanormal(npn) transformation
als.npn <- transformData(alsData$exprs)$data
# graph-based trees
graph <- alsData$graph
seed <- V(graph)$name[sample(1:vcount(graph), 10)]
tree1 <- SEMtree(graph, als.npn, seed=seed, type="ST", verbose=TRUE)
tree2 <- SEMtree(graph, als.npn, seed=NULL, type="MST", verbose=TRUE)
# data-driven trees
V <- colnames(als.npn)[colnames(als.npn) %in% V(graph)$name]
tree3 <- SEMtree(NULL, als.npn, seed=V, type="CAT", verbose=TRUE)
tree4 <- SEMtree(NULL, als.npn, seed=V, type="CPDAG", alpha=0.05, verbose=TRUE)
Missing edge testing implied by a DAG with Shipley's basis-set
Description
Compute all the P-values of the d-separation tests implied by the missing edges of a given acyclic graph (DAG). The conditioning set Z is represented, in a DAG, by the union of the parent sets of X and Y (Shipley, 2000). The results of every test, in a DAG, is then combined using the Fisher’s statistic in an overall test of the fitted model C = -2*sum(log(P-value(k))), where C is distributed as a chi-squared variate with df = 2k, as suggested by Shipley (2000).
Usage
Shipley.test(
graph,
data,
MCX2 = FALSE,
cmax = Inf,
limit = 100,
verbose = TRUE,
...
)
Arguments
graph |
A directed graph as an igraph object. |
data |
A data matrix with subjects as rows and variables as columns. |
MCX2 |
If TRUE, a Monte Carlo P-value of the combined C test is enabled using the R code of Shipley extracted from <https://github.com/BillShipley/CauseAndCorrelation>. |
cmax |
Maximum number of parents set, C. This parameter can be used to perform only those tests where the number of conditioning variables does not exceed the given value. High-dimensional conditional independence tests can be very unreliable. By default, cmax = Inf. |
limit |
An integer value corresponding to the graph size (vcount)
tolerance. Beyond this limit, multicore computation is enabled to
reduce the computational burden. By default, |
verbose |
If TRUE, Shipley's test results will be showed to screen (default = TRUE). |
... |
Currently ignored. |
Value
A list of three objects: (i) "dag": the DAG used to perform the Shipley test (ii) "dsep": the data.frame of all d-separation tests over missing edges in the DAG and (iii) "ctest": the overall Shipley's' P-value.
Author(s)
Mario Grassi mario.grassi@unipv.it
References
Shipley B (2000). A new inferential test for path models based on DAGs. Structural Equation Modeling, 7(2): 206-218. <https://doi.org/10.1207/S15328007SEM0702_4>
Examples
#\donttest{
# Nonparanormal(npn) transformation
als.npn <- transformData(alsData$exprs)$data
sem <- SEMrun(alsData$graph, als.npn)
C_test <- Shipley.test(sem$graph, als.npn, MCX2 = FALSE)
#MC_test <- Shipley.test(sem$graph, als.npn, MCX2 = TRUE)
#}
Amyotrophic Lateral Sclerosis (ALS) dataset
Description
Expression profiling through high-throughput sequencing (RNA-seq) of 139 ALS patients and 21 healthy controls (HCs), from Tam et al. (2019).
Usage
alsData
Format
alsData is a list of 4 objects:
"graph", ALS graph as the largest connected component of the "Amyotrophic lateral sclerosis (ALS)" pathway from KEGG database;
"exprs", a matrix of 160 rows (subjects) and 318 columns (genes) extracted from the original 17695. This subset includes genes from KEGG pathways, needed to run SEMgraph examples. Raw data from the GEO dataset GSE124439 (Tam et al., 2019) were pre-processed applying batch effect correction, using the sva R package (Leek et al., 2012), to remove data production center and brain area biases. Using multidimensional scaling-based clustering, ALS-specific and an HC-specific clusters were generated. Misclassified samples were blacklisted and removed from the current dataset;
"group", a binary group vector of 139 ALS subjects (1) and 21 healthy controls (0);
"details", a data.frame reporting information about included and blacklisted samples.
Source
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE124439
References
Tam OH, Rozhkov NV, Shaw R, Kim D et al. (2019). Postmortem Cortex Samples Identify Distinct Molecular Subtypes of ALS: Retrotransposon Activation, Oxidative Stress, and Activated Glia. Cell Reports, 29(5):1164-1177.e5. <https://doi.org/10.1016/j.celrep.2019.09.066>
Jeffrey T. Leek, W. Evan Johnson, Hilary S. Parker, Andrew E. Jaffe, and John D. Storey (2012). The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. Mar 15; 28(6): 882-883. <https://doi.org/10.1093/bioinformatics/bts034>
Examples
alsData$graph
dim(alsData$exprs)
table(alsData$group)
Node ancestry utilities
Description
Get ancestry for a collection of nodes in a graph.
These functions are wrappers for the original SEMID
R package.
Usage
ancestors(g, nodes)
descendants(g, nodes)
parents(g, nodes)
siblings(g, nodes)
Arguments
g |
An igraph object. |
nodes |
the nodes in the graph of which to get the ancestry. |
Value
a sorted vector of nodes.
References
Rina Foygel Barber, Mathias Drton and Luca Weihs (2019). SEMID: Identifiability of Linear Structural Equation Models. R package version 0.3.2. <https://CRAN.R-project.org/package=SEMID/>
Examples
# Get all ancestors
an <- V(sachs$graph)[ancestors(sachs$graph, "Erk")]; an
# Get parents
pa <- V(sachs$graph)[parents(sachs$graph, "PKC")]; pa
# Get descendants
de <- V(sachs$graph)[descendants(sachs$graph, "PKA")]; de
# Get siblings
sib <- V(sachs$graph)[siblings(sachs$graph, "PIP3")]; sib
Topological graph clustering
Description
Topological graph clustering methods.
Usage
clusterGraph(graph, type = "wtc", HM = "none", size = 5, verbose = FALSE, ...)
Arguments
graph |
An igraph object. |
type |
Topological clustering methods. If |
HM |
Hidden model type. Enables the visualization of the hidden
model, gHM. If set to "none" (default), no gHM igraph object is saved.
For each defined hidden module:
(i) if |
size |
Minimum number of nodes per module. By default, a minimum number of 5 nodes is required. |
verbose |
A logical value. If FALSE (default), the gHM igraph will not be plotted to screen, saving execution time (they will be returned in output anyway). |
... |
Currently ignored. |
Value
If HM is not "none" a list of 2 objects is returned:
"gHM", subgraph containing hidden modules as an igraph object;
"membership", cluster membership vector for each node.
If HM is "none", only the cluster membership vector is returned.
Author(s)
Mario Grassi mario.grassi@unipv.it
References
Fortunato S, Hric D. Community detection in networks: A user guide (2016). Phys Rep; 659: 1-44. <https://dx.doi.org/10.1016/j.physrep.2016.09.002>
Yu M, Hillebrand A, Tewarie P, Meier J, van Dijk B, Van Mieghem P, Stam CJ (2015). Hierarchical clustering in minimum spanning trees. Chaos 25(2): 023107. <https://doi.org/10.1063/1.4908014>
See Also
Examples
# Clustering ALS graph with WTC method and LV model
G <- properties(alsData$graph)[[1]]
clv <- clusterGraph(graph = G, type = "wtc", HM = "LV")
gplot(clv$gHM, l = "fdp")
table(clv$membership)
Module scoring
Description
Generate factor scores, principal component scores, or projection scores of latent, composite, and unmeasured variable modules, respectively, and fit them with an exogenous group effect.
Usage
clusterScore(
graph,
data,
group,
HM = "LV",
type = "wtc",
size = 5,
verbose = FALSE,
...
)
Arguments
graph |
An igraph object. |
data |
A matrix or data.frame. Rows correspond to subjects, and columns to graph nodes. |
group |
A binary vector. This vector must be as long as the number of subjects. Each vector element must be 1 for cases and 0 for control subjects. |
HM |
Hidden model type. For each defined hidden module:
(i) if |
type |
Graph clustering method. If |
size |
Minimum number of nodes per hidden module. By default, a minimum number of 5 nodes is required. |
verbose |
A logical value. If TRUE, intermediate graphs will be
displayed during the execution. In addition, a reduced graph with
clusters as nodes will be fitted and showed to screen (see also
|
... |
Currently ignored. |
Value
A list of 3 objects:
"fit", hidden module fitting as a lavaan object;
"membership", hidden module nodes membership;
clusterGraph
function;"dataHM", data matrix with cluster scores in first columns.
Author(s)
Mario Grassi mario.grassi@unipv.it
References
Grassi M, Palluzzi F, Tarantino B (2022). SEMgraph: An R Package for Causal Network Analysis of High-Throughput Data with Structural Equation Models. Bioinformatics, 38 (20), 4829–4830 <https://doi.org/10.1093/bioinformatics/btac567>
See Also
See clusterGraph
and cplot
for graph clustering.
Examples
# Nonparanormal(npn) transformation
als.npn <- transformData(alsData$exprs)$data
C <- clusterScore(graph = alsData$graph, data = als.npn,
group = alsData$group,
HM = "LV",
type = "wtc",
verbose = FALSE)
summary(C$fit)
head(C$dataHM)
table(C$membership)
Vertex and edge graph coloring on the base of fitting
Description
Add vertex and edge color attributes to an igraph object,
based on a fitting results data.frame generated by
SEMrun
.
Usage
colorGraph(
est,
graph,
group,
method = "none",
alpha = 0.05,
vcolor = c("lightblue", "white", "pink"),
ecolor = c("royalblue3", "gray50", "red2"),
ewidth = c(1, 2),
...
)
Arguments
est |
A data.frame of estimated parameters and p-values, derived
from the |
graph |
An igraph object. |
group |
group A binary vector. This vector must be as long as the number of subjects. Each vector element must be 1 for cases and 0 for control subjects. |
method |
Multiple testing correction method. One of the values
available in |
alpha |
Significance level for node and edge coloring
(by default, |
vcolor |
A vector of three color names. The first color is given
to nodes with P-value < alpha and beta < 0, the third color is given
to nodes with P-value < alpha and beta > 0, and the second is given
to nodes with P-value > alpha. By default,
|
ecolor |
A vector of three color names. The first color is given
to edges with P-value < alpha and regression coefficient < 0, the
third color is given to edges with P-value < alpha and regression
coefficient > 0, and the second is given to edges with P-value > alpha.
By default, |
ewidth |
A vector of two values. The first value refers to the basic edge width (i.e., edges with P-value > alpha), while the second is given to edges with P-value < alpha. By default ewidth = c(1, 2). |
... |
Currently ignored. |
Value
An igraph object with vertex and edge color and width attributes.
Author(s)
Mario Grassi mario.grassi@unipv.it
Examples
# Model fitting: node perturbation
sem1 <- SEMrun(graph = alsData$graph, data = alsData$exprs,
group = alsData$group,
fit = 1)
est1 <- parameterEstimates(sem1$fit)
# Model fitting: edge perturbation
sem2 <- SEMrun(graph = alsData$graph, data = alsData$exprs,
group = alsData$group,
fit = 2)
est20 <- subset(parameterEstimates(sem2$fit), group == 1)[, -c(4, 5)]
est21 <- subset(parameterEstimates(sem2$fit), group == 2)[, -c(4, 5)]
# Graphs
g <- alsData$graph
x <- alsData$group
old.par <- par(no.readonly = TRUE)
par(mfrow=c(2,2), mar=rep(1,4))
gplot(colorGraph(est = est1, g, group = x, method = "BH"),
main = "vertex differences")
gplot(colorGraph(est = sem2$dest, g, group = NULL),
main = "edge differences")
gplot(colorGraph(est = est20, g, group = NULL),
main = "edges for group = 0")
gplot(colorGraph(est = est21, g, group = NULL),
main = "edges for group = 1")
par(old.par)
Subgraph mapping
Description
Map groups of nodes onto an input graph, based on a membership vector.
Usage
cplot(graph, membership, l = layout.auto, map = FALSE, verbose = FALSE, ...)
Arguments
graph |
An igraph object. |
membership |
Cluster membership vector for each node. |
l |
graph layout. One of the |
map |
A logical value. Visualize cluster mapping over the input graph. If FALSE (default), visualization will be disabled. For large graphs, visualization may take long. |
verbose |
A logical value. If FALSE (default), the processed graphs will not be plotted to screen, saving execution time (they will be returned in output anyway). |
... |
Currently ignored. |
Value
The list of clusters and cluster mapping as igraph objects.
Author(s)
Mario Grassi mario.grassi@unipv.it
See Also
Examples
# Clustering ALS graph with WTC method
G <- alsData$graph
membership <- clusterGraph(graph = G, type = "wtc")
cplot(G, membership, map = TRUE, verbose = FALSE)
cplot(G, membership, map = FALSE, verbose = TRUE)
# The list of cluster graphs !
cg <- cplot(G, membership); cg
Graph conversion from dagitty to igraph
Description
Convert a dagitty object to a igraph object.
Usage
dagitty2graph(dagi, verbose = FALSE, ...)
Arguments
dagi |
A graph as a dagitty object ("dag" or "pdag"). |
verbose |
A logical value. If TRUE, the output graph is shown. This argument is FALSE by default. |
... |
Currently ignored. |
Value
An igraph object.
Author(s)
Mario Grassi mario.grassi@unipv.it
Examples
# Conversion from igraph to dagitty (and viceversa)
dagi <- graph2dagitty(sachs$graph, verbose = TRUE)
graph <- dagitty2graph(dagi, verbose = TRUE)
Cluster extraction utility
Description
Extract and fit clusters from an input graph.
Usage
extractClusters(
graph,
data,
group = NULL,
membership = NULL,
map = FALSE,
verbose = FALSE,
...
)
Arguments
graph |
Input network as an igraph object. |
data |
A matrix or data.frame. Rows correspond to subjects, and columns to graph nodes (variables). |
group |
A binary vector. This vector must be as long as the
number of subjects. Each vector element must be 1 for cases and 0
for control subjects. Group specification enables node perturbation
testing. By default, |
membership |
A vector of cluster membership IDs. If NULL, clusters
will be automatically generated with |
map |
Logical value. If TRUE, the plot of the input graph
(coloured by cluster membership) will be generated along with independent
module plots. If the input graph is very large, plotting could be
computationally intensive (by default, |
verbose |
Logical value. If TRUE, a plot will be showed for each cluster. |
... |
Currently ignored. |
Value
A list of 3 objects:
"clusters", list of clusters as igraph objects;
"fit", list of fitting results for each cluster as a lavaan object;
"dfc", data.frame of summary results.
Author(s)
Fernando Palluzzi fernando.palluzzi@gmail.com
Examples
# Nonparanormal(npn) transformation
als.npn <- transformData(alsData$exprs)$data
adjdata <- SEMbap(alsData$graph, als.npn)$data
# Clusters creation
clusters <- extractClusters(alsData$graph, adjdata, alsData$group)
print(clusters$dfc)
head(parameterEstimates(clusters$fit$HM1))
head(parameterEstimates(clusters$fit$HM2))
head(parameterEstimates(clusters$fit$HM4))
gplot(clusters$clusters$HM2)
# Map cluster on the input graph
g <- alsData$graph
c <- clusters$clusters$HM2
V(g)$color <- ifelse(V(g)$name %in% V(c)$name, "gold", "white")
gplot(g)
Factor analysis for high dimensional data
Description
Wrapper for Factor Analysis with potentially high dimensional variables implement in the "cate" R package (Author: Jingshu Wang [aut], Qingyuan Zhao [aut, cre] Maintainer: Qingyuan Zhao <qz280@cam.ac.uk>) that is optimized for the high dimensional problem where the number of samples n is less than the number of variables p.
Usage
factor.analysis(Y, r = 1, method = "pc")
Arguments
Y |
data matrix, a n*p matrix |
r |
number of factors (default, r =1) |
method |
algorithm to be used, "pc" (default) or "ml" |
Details
The two methods extracted from "cate" are quasi-maximum likelihood (ml), and principal component analysis (pc). The ml is iteratively solved the EM algorithm using the PCA solution as the initial value. See Bai and Li (2012) for more details.
Value
a list of objects
- Gamma
estimated factor loadings
- Z
estimated latent factors
- Sigma
estimated noise variance matrix
References
Jushan Bai and Kunpeng Li (2012). Statistical Analysis of Factor Models of High Dimension. The Annals of Statistics, 40 (1), 436-465 <https://doi.org/10.1214/11-AOS966>
Jingshu Wang and Qingyuan Zhao (2020). cate: High Dimensional Factor Analysis and Confounder Adjusted Testing and Estimation. R package version 1.1.1. <https://CRAN.R-project.org/package=cate>
Examples
# Nonparanormal(npn) transformation
als.npn <- transformData(alsData$exprs)$data
## pc
pc<- factor.analysis(Y = als.npn, r = 2, method = "pc")
head(pc$Gamma)
head(pc$Z)
head(pc$Sigma)
## ml
ml <- factor.analysis(Y = als.npn, r = 2, method = "ml")
head(ml$Gamma)
head(ml$Z)
head(ml$Sigma)
Graph plotting with renderGraph
Description
Wrapper for function renderGraph of the R package Rgraphwiz.
Usage
gplot(
graph,
l = "dot",
main = "",
cex.main = 1,
font.main = 1,
color.txt = "black",
fontsize = 16,
cex = 0.6,
shape = "circle",
color = "gray70",
lty = 1,
lwd = 1,
w = "auto",
h = "auto",
psize = 80,
...
)
Arguments
graph |
An igraph or graphNEL object. |
l |
Any layout supported by |
main |
Plot main title (by default, no title is added). |
cex.main |
Main title size (default = 1). |
font.main |
Main title font (default = 1). Available options are: 1 for plain text, 2 for bold, 3 for italics, 4 for bold italics, and 5 for symbol. |
color.txt |
Node text color (default = "black"). |
fontsize |
Node text size (default = 16). |
cex |
Another argument to control node text size (default = 0.6). |
shape |
Node shape (default = "circle"). |
color |
Node border color (default = "gray70"). |
lty |
Node border outline (default = 1). Available options include: 0 for blank, 1 for solid line, 2 for dashed, 3 for dotted, 4 for dotdash, 5 for longdash, and 6 for twodash. |
lwd |
Node border thickness (default = 1). |
w |
Manual node width (default = "auto"). |
h |
Manual node height (default = "auto"). |
psize |
Automatic node size (default = 80). |
... |
Currently ignored. |
Value
gplot returns invisibly the graph object produced by Rgraphviz
Author(s)
Mario Grassi mario.grassi@unipv.it
Examples
gplot(sachs$graph, main = "input graph")
sem <- SEMrun(sachs$graph, sachs$pkc)
gplot(sem$graph, main = "output graph")
Convert directed graphs to directed acyclic graphs (DAGs)
Description
Remove cycles and bidirected edges from a directed graph.
Usage
graph2dag(graph, data, bap = FALSE, time.limit = Inf, ...)
Arguments
graph |
A directed graph as an igraph object. |
data |
A data matrix with subjects as rows and variables as columns. |
bap |
If TRUE, a bow-free acyclic path (BAP) is returned (default = FALSE). |
time.limit |
CPU time for the computation, in seconds (default = Inf). |
... |
Currently ignored. |
Details
The conversion is performed firstly by removing bidirected
edges and then the data matrix is used to compute edge P-values, through
marginal correlation testing (see weightGraph
,
r-to-z method). When a cycle is detected, the edge with highest
P-value is removed, breaking the cycle. If the bap argument is TRUE,
a BAP is generated merging the output DAG and the bidirected edges
from the input graph.
Value
A DAG as an igraph object.
Author(s)
Mario Grassi mario.grassi@unipv.it
Examples
dag <- graph2dag(graph = sachs$graph, data = log(sachs$pkc))
old.par <- par(no.readonly = TRUE)
par(mfrow=c(1,2), mar=rep(1, 4))
gplot(sachs$graph, main = "Input graph")
gplot(dag, main = "Output DAG")
par(old.par)
Graph conversion from igraph to dagitty
Description
Convert an igraph object to a dagitty object.
Usage
graph2dagitty(graph, graphType = "dag", verbose = FALSE, ...)
Arguments
graph |
A graph as an igraph or as an adjacency matrix. |
graphType |
character, is one of "dag" (default)' or "pdag". DAG can contain the directed (->) and bi-directed (<->) edges, while PDAG can contain the edges: ->, <->, and the undirected edges (–) that represent edges whose direction is not known. |
verbose |
A logical value. If TRUE, the output graph is shown. This argument is FALSE by default. |
... |
Currently ignored. |
Value
A dagitty object.
Author(s)
Mario Grassi mario.grassi@unipv.it
Examples
# Graph as an igraph object to dagitty object
G <- graph2dagitty(sachs$graph)
plot(dagitty::graphLayout(G))
Graph to lavaan model
Description
Convert an igraph object to a model (lavaan syntax).
Usage
graph2lavaan(graph, nodes = V(graph)$name, ...)
Arguments
graph |
A graph as an igraph object. |
nodes |
Subset of nodes to be included in the model. By default, all the input graph nodes will be included in the output model. |
... |
Currently ignored. |
Value
A model in lavaan syntax.
Author(s)
Mario Grassi mario.grassi@unipv.it
Examples
# Graph (igraph object) to structural model in lavaan syntax
model <- graph2lavaan(sachs$graph)
cat(model, "\n")
KEGG interactome
Description
Interactome generated by merging KEGG pathways
extracted using the ROntoTools
R package (update: Jan, 2025).
Usage
kegg
Format
"kegg" is an igraph network object of 4910 nodes and 45200 edges corresponding to the union of 231 KEGG pathways.
Source
References
Kanehisa M, Goto S (1999). KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acid Research 28(1): 27-30. <https://doi.org/10.1093/nar/27.1.29>
Calin Voichita, Sahar Ansari and Sorin Draghici (2023). ROntoTools: R Onto-Tools suite. R package version 2.30.0.
Examples
# KEGG graph
summary(kegg)
# KEGG degrees of freedom
vcount(kegg)*(vcount(kegg) - 1)/2 - ecount(kegg)
KEGG pathways
Description
KEGG pathways extracted using the ROntoTools
R package (update: Jan, 2025).
Usage
kegg.pathways
Format
"kegg.pathways" is a list of 231 igraph objects corresponding to the KEGG pathways.
Source
References
Kanehisa M, Goto S (1999). KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acid Research 28(1): 27-30. <https://doi.org/10.1093/nar/27.1.29>
Calin Voichita, Sahar Ansari and Sorin Draghici (2023). ROntoTools: R Onto-Tools suite. R package version 2.30.0.
Examples
library(igraph)
# KEGG pathways
names(kegg.pathways)
i<-which(names(kegg.pathways)=="Type II diabetes mellitus");i
ig<- kegg.pathways[[i]]
summary(ig)
V(ig)$name
E(ig)$weight
gplot(ig, l="fdp", psize=50, main=names(kegg.pathways[i]))
lavaan model to graph
Description
Convert a model, specified using lavaan syntax, to an igraph object.
Usage
lavaan2graph(model, directed = TRUE, psi = TRUE, verbose = FALSE, ...)
Arguments
model |
Model specified using lavaan syntax. |
directed |
Logical value. If TRUE (default), edge directions from the model will be preserved. If FALSE, the resulting graph will be undirected. |
psi |
Logical value. If TRUE (default) covariances will be converted into bidirected graph edges. If FALSE, covariances will be excluded from the output graph. |
verbose |
Logical value. If TRUE, a plot of the output graph will be generated. For large graphs, this could significantly increase computation time. If FALSE (default), graph plotting will be disabled. |
... |
Currently ignored. |
Value
An igraph object.
Author(s)
Mario Grassi mario.grassi@unipv.it
Examples
# Writing path diagram in lavaan syntax
model<-"
#path model
Jnk ~ PKA + PKC
P38 ~ PKA + PKC
Akt ~ PKA + PIP3
Erk ~ PKA + Mek
Mek ~ PKA + PKC + Raf
Raf ~ PKA + PKC
PKC ~ PIP2 + Plcg
PIP2 ~ PIP3 + Plcg
Plcg ~ PIP3
#(co)variances
PKA ~~ PIP3
"
# Graph with covariances
G0 <- lavaan2graph(model, psi = TRUE)
plot(G0, layout = layout.circle)
# Graph without covariances
G1 <- lavaan2graph(model, psi = FALSE)
plot(G1, layout = layout.circle)
Conditional Independence (CI) local tests of an acyclic graph
Description
P-values of one minimal testable implication (with the
smallest possible conditioning set) is returned per missing edge
given an acyclic graph (DAG or BAP) using the function
impliedConditionalIndependencies
plus the
function localTests
from package dagitty
.
Without assuming any particular dependence structure, the p-values of
every CI test, in a DAG (BAP), is then combined using the Bonferroni’s
statistic in an overall test of the fitted model, B = K*min(p1,...,pK),
as reviewed in Vovk & Wang (2020).
Usage
localCI.test(graph, data, bap = FALSE, limit = 100, verbose = TRUE, ...)
Arguments
graph |
A directed graph as an igraph object. |
data |
A data matrix with subjects as rows and variables as columns. |
bap |
If TRUE, the input graph is trasformend in a BAP, if FALSE (defult) the input graph is reduced in a DAG. |
limit |
An integer value corresponding to the size of the
extracted acyclic graph. Beyond this limit, switch to Shipley's
C-test (Shipley 2000) is enabled to reduce the computational burden.
By default, |
verbose |
If TRUE, LocalCI results will be showed to screen (default = TRUE). |
... |
Currently ignored. |
Value
A list of three objects: (i) "dag": the DAG used to perform the localCI test (ii) "msep": the list of all m-separation tests over missing edges in the input graph and (iii) "mtest":the overall Bonferroni's P-value.
Author(s)
Mario Grassi mario.grassi@unipv.it
References
Vovk V, Wang R (2020). Combining p-values via averaging. Biometrika 107(4): 791-808. <https://doi.org/10.1093/biomet/asaa027>
Shipley B (2000). A new inferential test for path models based on DAGs. Structural Equation Modeling, 7(2): 206-218. <https://doi.org/10.1207/S15328007SEM0702_4>
Examples
# Nonparanormal(npn) transformation
als.npn <- transformData(alsData$exprs)$data
sem <- SEMrun(alsData$graph, als.npn)
B_test <- localCI.test(sem$graph, als.npn, verbose = TRUE)
Graph nodes merging by a membership attribute
Description
Merge groups of graph nodes using hierarchical clustering
with prototypes derived from protoclust
or
custom membership attribute (e.g., cluster membership derived from
clusterGraph
).
Usage
mergeNodes(
graph,
data,
h = 0.5,
membership = NULL,
HM = NULL,
verbose = FALSE,
...
)
Arguments
graph |
network as an igraph object. |
data |
A matrix or data.frame. Rows correspond to subjects, and
columns to graph nodes. If |
h |
Cutting the minimax clustering at height, h = 1 - abs(cor(j,k)),
yielding a merged node (and a reduced data set) in which every node in the
cluster has correlation of at least cor(j,k) with the prototype node.
By default, |
membership |
Cluster membership. A vector of cluster membership
identifiers as numeric values, where vector names correspond to graph
node names. By default, |
HM |
Hidden cluster label. If membership is derived from clusterGraph:
HM = "LV", a latent variable (LV) will be defined as common unknown cause
acting on cluster nodes. If HM = "CV", cluster nodes will be considered as
regressors of a latent composite variable (CV). Finally, if HM = "UV", an
unmeasured variable (UV) is defined, where source nodes of the module (i.e.,
in-degree = 0) act as common regressors influencing the other nodes
via an unmeasured variable. By default, |
verbose |
A logical value. If FALSE (default), the merged graphs will not be plotted to screen. |
... |
Currently ignored. |
Details
Hierarchical clustering with prototypes (or Minmax linkage) is unique in naturally associating a node (the prototypes) with every interior node of the dendogram. Thus, for each merge we have a single representative data point for the resulting cluster (Bien, Tibshirani, 2011). These prototypes can be used to greatly enhance the interpretability of merging nodes and data reduction for SEM fitting.
Value
A list of 2 objects is returned:
"gLM", A graph with merged nodes as an igraph object;
"membership", cluster membership vector for each node.
Author(s)
Mario Grassi mario.grassi@unipv.it
References
Bien J, Tibshirani R (2011). Hierarchical Clustering With Prototypes via Minimax Linkage. Journal of the American Statistical Association 106(495): 1075-1084. <doi:10.1198/jasa.2011.tm10183>
See Also
Examples
# Gene memberships with prototypes with h=0.5
G <- properties(alsData$graph)[[1]]
M <- mergeNodes(G, data = alsData$exprs, h = 0.5, verbose=TRUE)
# Gene memberships with EBC method and size=10
m <- clusterGraph(G, type = "ebc", size = 10)
M <- mergeNodes(G, membership = m, HM = "LV", verbose=TRUE)
# Gene memberships defined by user
c1 <- c("5894", "5576", "5567", "572", "598")
c2 <- c("6788", "84152", "2915", "836", "5530")
c3 <- c("5603", "6300", "1432", "5600")
m <- c(rep(1,5), rep(2,5), rep(3,4))
names(m) <- c(c1, c2, c3)
M <- mergeNodes(G, membership = m, HM = "CV", verbose=TRUE)
Optimal model search strategies
Description
Four model search strategies are implemented combining
SEMdag()
, SEMbap()
, and resizeGraph()
functions.
All strategies estimate a new graph by 1) adjusting (BAP deconfounding) the
the data matrix and 2) re-sizing the output DAG.
Usage
modelSearch(
graph,
data,
gnet = NULL,
d = 2,
search = "basic",
beta = 0,
method = "BH",
alpha = 0.05,
verbose = FALSE,
...
)
Arguments
graph |
Input graph as an igraph object. |
data |
A matrix or data.frame. Rows correspond to subjects, and columns to graph nodes (variables). |
gnet |
Reference directed network used to validate and import nodes and interactions. |
d |
Maximum allowed geodesic distance for directed or undirected
shortest path search. A distance |
search |
Search strategy. Four model search strategies are available:
|
beta |
Numeric value. Minimum absolute LASSO beta coefficient for
a new interaction to be retained in the estimated DAG backbone. Lower
|
method |
Multiple testing correction method. One of the values
available in |
alpha |
Significance level for false discovery rate (FDR) used
for local d-separation tests. This argument is used to
control data de-correlation. A higher |
verbose |
If TRUE, it shows intermediate graphs during the execution (not recommended for large graphs). |
... |
Currently ignored. |
Details
Search strategies can be ordered by decreasing conservativeness
respect to the input graph, as: "direct", "inner", "outer", and "basic".
The first three strategies are knowledge-based, since they require an
input graph and a reference network, together with data, for
knowledge-assisted model improvement. The last one does not require
any reference and the output model structure will be data-driven.
Output model complexity can be limited using arguments d
and
beta
.
While d is fixed to 0 or 1 in "basic" or "direct", respectively;
we suggest starting with d = 2
(only one mediator)
for the other two strategies.
For knowledge-based strategies, we suggest to to start with
beta = 0
. Then, beta can be relaxed (0 to < 0.1) to improve
model fitting, if needed. Since data-driven models can be complex,
we suggest to start from beta = 0
when using the "basic" strategy.
The beta
value can be relaxed until a good model fit is obtained.
Argument alpha determines the extent of data adjustment: lower alpha
values for FDR correction correspond to a smaller number of significant
confounding factors, hence a weaker correction
(default alpha = 0.05
).
Value
The output model as well as the adjusted dataset are returned as a list of 2 objects:
"graph", the output model as an igraph object;
"data", the adjusted dataset.
Author(s)
Mario Grassi mario.grassi@unipv.it
Examples
# Comparison among different model estimation strategies
# Nonparanormal(npn) transformation
als.npn <- transformData(alsData$exprs)$data
# Models estimation
m1 <- modelSearch(graph = alsData$graph, data = als.npn, gnet = kegg,
search = "direct", beta = 0, alpha = 0.05)
m2 <- modelSearch(graph = alsData$graph, data = als.npn, gnet = kegg,
d = 2, search = "inner", beta = 0, alpha = 0.05)
m3 <- modelSearch(graph = alsData$graph, data = als.npn, gnet = kegg,
d = 2, search = "outer", beta = 0, alpha = 0.05)
m4 <- modelSearch(graph = alsData$graph, data = als.npn, gnet = NULL,
search = "basic", beta = 0.1, alpha = 0.05)
# Graphs
#old.par <- par(no.readonly = TRUE)
#par(mfrow=c(2,2), mar= rep(1,4))
gplot(m1$graph, main = "direct graph")
gplot(m2$graph, main = "inner graph")
gplot(m3$graph, main = "outer graph")
gplot(m4$graph, main = "basic graph")
#par(old.par)
Assign edge orientation of an undirected graph
Description
Assign edge orientation of an undirected graph through a given reference directed graph. The vertex (color) and edge (color, width and weight) attributes of the input undirected graph are preserved in the output directed graph.
Usage
orientEdges(ug, dg, ...)
Arguments
ug |
An undirected graph as an igraph object. |
dg |
A directed reference graph. |
... |
Currently ignored. |
Value
A directed graph as an igraph object.
Examples
# Graphs definition
G0 <- as.undirected(sachs$graph)
# Reference graph-based orientation
G1 <- orientEdges(ug = G0, dg = sachs$graph)
# Graphs plotting
old.par <- par(no.readonly = TRUE)
par(mfrow=c(1,2), mar=rep(2,4))
plot(G0, layout=layout.circle, main = "Input undirected graph")
plot(G1, layout=layout.circle, main = "Output directed graph")
par(old.par)
Pairwise plotting of multivariate data
Description
Display a pairwise scatter plot of two datasets for a random selection of variables. If the second dataset is not given, the function displays a histogram with normal curve superposition.
Usage
pairwiseMatrix(x, y = NULL, size = nrow(x), r = 4, c = 4, ...)
Arguments
x |
A matrix or data.frame (n x p) of continuous data. |
y |
A matrix or data.frame (n x q) of continuous data. |
size |
number of rows to be sampled (default |
r |
number of rows of the plot layout (default |
c |
number of columns of the plot layout (default |
... |
Currently ignored. |
Value
No return value
Author(s)
Mario Grassi mario.grassi@unipv.it
Examples
adjdata <- SEMbap(sachs$graph, log(sachs$pkc))$data
rawdata <- log(sachs$pkc)
pairwiseMatrix(adjdata, rawdata, size = 1000)
Parameter Estimates of a fitted SEM
Description
Wrapper of the lavaan parameterEstimates() function for RICF and CGGM algorithms
Usage
parameterEstimates(fit, ...)
Arguments
fit |
A RICF or constrained GGM fitted model object. |
... |
Currently ignored. |
Value
A data.frame containing the estimated parameters
Author(s)
Mario Grassi mario.grassi@unipv.it
Examples
ricf1 <- SEMrun(sachs$graph, log(sachs$pkc), sachs$group, algo = "ricf")
parameterEstimates(ricf1$fit)
cggm1 <- SEMrun(sachs$graph, log(sachs$pkc), sachs$group, algo = "cggm")
parameterEstimates(cggm1$fit)
Perturbed path search utility
Description
This function uses SEMace
to find
significant causal effects between source-sink pairs and
SEMpath
to fit them and test their edge
perturbation.
Usage
pathFinder(
graph,
data,
group = NULL,
ace = NULL,
path = "directed",
method = "BH",
alpha = 0.05,
...
)
Arguments
graph |
Input network as an igraph object. |
data |
A matrix or data.frame. Rows correspond to subjects, and columns to graph nodes (variables). |
group |
group A binary vector. This vector must be as long as the
number of subjects. Each vector element must be 1 for cases and 0
for control subjects. Group specification enables edge perturbation
testing. By default, |
ace |
A data.frame generated by |
path |
If |
method |
Multiple testing correction method. One of the values
available in |
alpha |
Significance level for ACE selection (by default,
|
... |
Currently ignored. |
Value
A list of 3 objects:
"paths", list of paths as igraph objects;
"fit", fitting results for each path as a lavaan object;
"dfp", a data.frame containing SEM global fitting statistics.
Author(s)
Fernando Palluzzi fernando.palluzzi@gmail.com
Examples
# Find and evaluate significantly perturbed paths
# Nonparanormal(npn) transformation
als.npn <- transformData(alsData$exprs)$data
adjData <- SEMbap(alsData$graph, als.npn)$data
paths <- pathFinder(alsData$graph, adjData,
group = alsData$group,
ace = NULL)
print(paths$dfp)
head(parameterEstimates(paths$fit[[1]]))
gplot(paths$paths[[1]])
Graph properties summary and graph decomposition
Description
Produces a summary of network properties and returns graph components (ordered by decreasing size), without self-loops.
Usage
properties(graph, data = NULL, ...)
Arguments
graph |
Input network as an igraph object. |
data |
An optional data matrix (default data = NULL) whith rows corresponding to subjects, and columns to graph nodes (variables). Nodes will be mapped onto variable names. |
... |
Currently ignored. |
Value
List of graph components, ordered by decreasing size (the first component is the giant one), without self-loops.
Author(s)
Mario Grassi mario.grassi@unipv.it
Examples
# Extract the "Type II diabetes mellitus" pathway:
g <- kegg.pathways[["Type II diabetes mellitus"]]
summary(g)
properties(g)
Interactome-assisted graph re-seizing
Description
An input directed graph is re-sized, removing edges or adding edges/nodes. This function takes three input graphs: the first is the input causal model (i.e., a directed graph), and the second can be either a directed or undirected graph, providing a set of connections to be checked against a directed reference network (i.e., the third input) and imported to the first graph.
Usage
resizeGraph(g = list(), gnet, d = 2, v = TRUE, verbose = FALSE, ...)
Arguments
g |
A list of two graphs as igraph objects, g=list(graph1, graph2). |
gnet |
External directed network as an igraph object. The reference
network should have weighted edges, corresponding to their interaction
p-values, as an edge attribute |
d |
An integer value indicating the maximum geodesic distance between
two nodes in the interactome to consider the inferred interaction between
the same two nodes in |
v |
A logical value. If TRUE (default) new nodes and edges on the validated shortest path in the reference interactome will be added in the re-sized graph. |
verbose |
A logical value. If FALSE (default), the processed graphs will not be plotted to screen, saving execution time (for large graphs) |
... |
Currently ignored. |
Details
Typically, the first graph is an estimated causal graph (DAG),
and the second graph is the output of either SEMdag
or SEMbap
. Alternatively, the first graph is an
empthy graph, and the second graph is a external covariance graph.
In the former we use the new inferred causal structure stored in the
dag.new
object. In the latter, we use the new inferred covariance
structure stored in the guu
object. Both directed (causal) edges
inferred by SEMdag()
and covariances (i.e., bidirected edges)
added by SEMbap()
, highlight emergent hidden topological
proprieties, absent in the input graph. Estimated directed edges between
nodes X and Y are interpreted as either direct links or direct paths
mediated by hidden connector nodes. Covariances between any two bow-free
nodes X and Y may hide causal relationships, not explicitly represented
in the current model. Conversely, directed edges could be redundant or
artifact, specific to the observed data and could be deleted.
Function resizeGraph()
leverage on these concepts to extend/reduce a
causal model, importing new connectors or deleting estimated edges, if they are
present or absent in a given reference network. The whole process may lead to
the discovery of new paths of information flow, and cut edges not corroborate
by a validated network. Since added nodes can already be present in the causal
graph, network resize may create cross-connections between old and new paths
and their possible closure into circuits.
Value
"Ug", the re-sized graph, the graph union of the causal graph graph1
and the re-sized graph graph2
Author(s)
Mario Grassi mario.grassi@unipv.it
References
Grassi M, Palluzzi F, Tarantino B (2022). SEMgraph: An R Package for Causal Network Analysis of High-Throughput Data with Structural Equation Models. Bioinformatics, 38 (20), 4829–4830 <https://doi.org/10.1093/bioinformatics/btac567>
Examples
# Extract the "Protein processing in endoplasmic reticulum" pathway:
g <- kegg.pathways[["Protein processing in endoplasmic reticulum"]]
G <- properties(g)[[1]]; summary(G)
# Extend a graph using new inferred DAG edges (dag+dag.new):
# Nonparanormal(npn) transformation
als.npn <- transformData(alsData$exprs)$data
dag <- SEMdag(graph = G, data = als.npn, beta = 0.1)
gplot(dag$dag)
ext <- resizeGraph(g=list(dag$dag, dag$dag.new), gnet = kegg, d = 2)
gplot(ext)
# Create a directed graph from correlation matrix, using
# i) an empty graph as causal graph,
# ii) a covariance graph,
# iii) KEGG as reference:
corr2graph<- function(R, n, alpha=5e-6, ...)
{
Z <- qnorm(alpha/2, lower.tail=FALSE)
thr <- (exp(2*Z/sqrt(n-3))-1)/(exp(2*Z/sqrt(n-3))+1)
A <- ifelse(abs(R) > thr, 1, 0)
diag(A) <- 0
return(graph_from_adjacency_matrix(A, mode="undirected"))
}
v <- which(colnames(als.npn) %in% V(G)$name)
selectedData <- als.npn[, v]
G0 <- make_empty_graph(n = ncol(selectedData))
V(G0)$name <- colnames(selectedData)
G1 <- corr2graph(R = cor(selectedData), n= nrow(selectedData))
ext <- resizeGraph(g=list(G0, G1), gnet = kegg, d = 2, v = TRUE)
#Graphs
old.par <- par(no.readonly = TRUE)
par(mfrow=c(1,2), mar=rep(1,4))
plot(G1, layout = layout.circle)
plot(ext, layout = layout.circle)
par(old.par)
Sachs multiparameter flow cytometry data and consensus model
Description
Flow cytometry data and causal model from Sachs et al. (2005).
Usage
sachs
Format
"sachs" is a list of 5 objects:
"rawdata", a list of 14 data.frames containing raw flow cytometry data (Sachs et al., 2005);
"graph", consensus signaling network;
"model", consensus model (lavaan syntax);
"pkc", data.frame of 1766 samples and 11 variables, containing cd3cd28 (baseline) and pma (PKC activation) data;
"group", a binary group vector, where 0 is for cd3cd28 samples (n = 853) and 1 is for pma samples (n = 913).
"details", a data.frame containing dataset information.
Source
References
Sachs K, Perez O, Pe'er D, Lauffenburger DA, Nolan GP (2019). Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data. Science, 308(5721): 523-529.
Examples
# Dataset content
names(sachs$rawdata)
dim(sachs$pkc)
table(sachs$group)
cat(sachs$model)
gplot(sachs$graph)
GGM model summary
Description
Generate a summary for a constrained Gaussian Graphical Model (GGM) similar to lavaan-formated summary
Usage
## S3 method for class 'GGM'
summary(object, ...)
Arguments
object |
A constrained GGM fitted model object. |
... |
Currently ignored. |
Value
Shown the lavaan-formatted summary to console
Author(s)
Mario Grassi mario.grassi@unipv.it
See Also
Examples
sem0 <- SEMrun(sachs$graph, log(sachs$pkc), algo = "cggm")
summary(sem0$fit)
RICF model summary
Description
Generate a summary for a RICF solver similar to lavaan-formatted summary
Usage
## S3 method for class 'RICF'
summary(object, ...)
Arguments
object |
A RICF fitted model object. |
... |
Currently ignored. |
Value
Shown the lavaan-formatted summary to console
Author(s)
Mario Grassi mario.grassi@unipv.it
See Also
Examples
sem1 <- SEMrun(sachs$graph, log(sachs$pkc), sachs$group, algo = "ricf")
summary(sem1$fit)
Transform data methods
Description
Implements various data trasformation methods with optimal scaling for ordinal or nominal data, and to help relax the assumption of normality (gaussianity) for continuous data.
Usage
transformData(x, method = "npn", ...)
Arguments
x |
A matrix or data.frame (n x p). Rows correspond to subjects, and columns to graph nodes. |
method |
Trasform data method. It can be one of the following:
|
... |
Currently ignored. |
Details
Nonparanormal trasformation is computationally very efficient
and only requires one ECDF pass of the data matrix. Polychoric correlation
matrix is computed with the lavCor()
function of the lavaan
package. Optimal scaling (lineals and mca) is performed with the
lineals()
and corAspect()
functions of the aspect
package (Mair and De Leeuw, 2008). To note, SEM fitting of the generate data
(fake data) must be done with a covariance-based method and bootstrap SE,
i.e., with SEMrun(..., algo="ricf", n_rep=1000)
.
Value
A list of 2 objects is returned:
"data", the matrix (n x p) of n observations and p transformed variables or the matrix (n x p) of simulate observations based on the selected correlation matrix.
"catscores", the category weights for "lineals" or "mca" methods or NULL otherwise.
Author(s)
Mario Grassi mario.grassi@unipv.it
References
Liu H, Lafferty J, and Wasserman L (2009). The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs. Journal of Machine Learning Research 10(80): 2295-2328
Harris N, and Drton M (2013). PC Algorithm for Nonparanormal Graphical Models. Journal of Machine Learning Research 14 (69): 3365-3383
Olsson U (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44(4), 443-460.
Mair P, and De Leeuw J (2008). Scaling variables by optimizing correlational and non-correlational aspects in R. Journal of Statistical Software, 32(9), 1-23.
de Leeuw J (1988). Multivariate analysis with linearizable regressions. Psychometrika, 53, 437-454.
Examples
#... with continuous ALS data
graph<- alsData$graph
data<- alsData$exprs; dim(data)
X<- data[, colnames(data) %in% V(graph)$name]; dim(X)
npn.data<- transformData(X, method="npn")
sem0.npn<- SEMrun(graph, npn.data$data, algo="cggm")
mvnS.data<- transformData(X, method="spearman")
sem0.mvnS<- SEMrun(graph, mvnS.data$data, algo="cggm")
mvnK.data<- transformData(X, method="kendall")
sem0.mvnK<- SEMrun(graph, mvnK.data$data, algo="cggm")
#...with ordinal (K=4 categories) ALS data
Xord <- data.frame(X)
Xord <- as.data.frame(lapply(Xord, cut, 4, labels = FALSE))
colnames(Xord) <- sub("X", "", colnames(Xord))
mvnP.data<- transformData(Xord, method="polychoric")
sem0.mvnP<- SEMrun(graph, mvnP.data$data, algo="cggm")
#...with nominal (K=4 categories) ALS data
mca.data<- transformData(Xord, method="mca")
sem0.mca<- SEMrun(graph, mca.data$data, algo="cggm")
mca.data$catscores
gplot(sem0.mca$graph, l="fdp", main="ALS mca")
# plot colored graphs
#par(mfrow=c(2,2), mar=rep(1,4))
#gplot(sem0.npn$graph, l="fdp", main="ALS npm")
#gplot(sem0.mvnS$graph, l="fdp", main="ALS mvnS")
#gplot(sem0.mvnK$graph, l="fdp", main="ALS mvnK")
#gplot(sem0.mvnP$graph, l="fdp", main="ALS mvnP")
Graph weighting methods
Description
Add data-driven edge and node weights to the input graph.
Usage
weightGraph(graph, data, group = NULL, method = "r2z", limit = 10000, ...)
Arguments
graph |
An igraph object. |
data |
A matrix or data.frame. Rows correspond to subjects, and columns to graph nodes. |
group |
Binary vector. This vector must be as long as the number
of subjects. Each vector element must be 1 for cases and 0 for control
subjects. By default, |
method |
Edge weighting method. It can be one of the following:
|
limit |
An integer value corresponding to the number of graph
edges. Beyond this limit, multicore computation is enabled to reduce
the computational burden. By default, |
... |
Currently ignored. |
Value
A weighted graph, as an igraph object.
Author(s)
Mario Grassi mario.grassi@unipv.it
References
Grassi M, Tarantino B (2023). [Supplementary material of] SEMtree: tree-based structure learning methods with structural equation models. Bioinformatics, 39 (6), 4829–4830 <https://doi.org/10.1093/bioinformatics/btad377>
Fisher RA (1915). Frequency Distribution of the Values of the Correlation Coefficient in Samples from an Indefinitely Large Population. Biometrika, 10(4), 507–521. <doi:10.2307/2331838>
Examples
# Graph weighting
G <- weightGraph(graph = sachs$graph,
data = log(sachs$pkc),
group = sachs$group,
method = "r2z")
# New edge attributes
head(E(G)$pv); summary(E(G)$pv)
head(E(G)$zsign); table(E(G)$zsign)
# New node attributes
head(V(G)$pv); summary(V(G)$pv)
head(V(G)$zsign); table(V(G)$zsign)