Type: | Package |
Date: | 2025-03-05 |
Title: | Gaussian Parsimonious Clustering Models with Covariates and a Noise Component |
Version: | 1.6.0 |
Description: | Clustering via parsimonious Gaussian Mixtures of Experts using the MoEClust models introduced by Murphy and Murphy (2020) <doi:10.1007/s11634-019-00373-8>. This package fits finite Gaussian mixture models with a formula interface for supplying gating and/or expert network covariates using a range of parsimonious covariance parameterisations from the GPCM family via the EM/CEM algorithm. Visualisation of the results of such models using generalised pairs plots and the inclusion of an additional noise component is also facilitated. A greedy forward stepwise search algorithm is provided for identifying the optimal model in terms of the number of components, the GPCM covariance parameterisation, and the subsets of gating/expert network covariates. |
Depends: | R (≥ 4.0.0) |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
URL: | https://cran.r-project.org/package=MoEClust |
BugReports: | https://github.com/Keefe-Murphy/MoEClust/issues |
LazyData: | true |
Imports: | lattice (≥ 0.12), matrixStats (≥ 1.0.0), mclust (≥ 6.1), mvnfast, nnet (≥ 7.3-0), vcd |
Suggests: | cluster (≥ 1.4.0), clustMD (≥ 1.2.1), geometry (≥ 0.4.0), knitr, rmarkdown, snow |
RoxygenNote: | 7.3.2 |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-03-05 18:07:59 UTC; Keefe |
Author: | Keefe Murphy |
Maintainer: | Keefe Murphy <keefe.murphy@mu.ie> |
Repository: | CRAN |
Date/Publication: | 2025-03-05 18:50:02 UTC |
MoEClust: Gaussian Parsimonious Clustering Models with Covariates and a Noise Component
Description
Clustering via parsimonious Gaussian Mixtures of Experts using the MoEClust models introduced by Murphy and Murphy (2020) <doi:10.1007/s11634-019-00373-8>. This package fits finite Gaussian mixture models with gating and/or expert network covariates using a range of parsimonious covariance parameterisations from the GPCM family via the EM/CEM algorithm. Visualisation of the results of such models using generalised pairs plots and the inclusion of an additional noise component is also facilitated.
Details
- Type:
Package
- Package:
MoEClust
- Version:
1.6.0
- Date:
2025-03-05 (this version), 2017-11-28 (original release)
- Licence:
GPL (>= 3)
Usage
The most important function in the MoEClust package is: MoE_clust
, for fitting the model via EM/CEM with gating and/or expert network covariates, supplied via formula interfaces.
MoE_compare
is provided for conducting model selection between different results from MoE_clust
using different covariate combinations &/or initialisation strategies, etc.
MoE_stepwise
is provided for conducting a greedy forward stepwise search to identify the optimal model in terms of the number of components, GPCM covariance type, and the subsets of gating/expert network covariates.
MoE_control
allows supplying additional arguments to MoE_clust
and MoE_stepwise
which govern, among other things, controls on the inclusion of an additional noise component and controls on the initialisation of the allocations for the EM/CEM algorithm.
A dedicated plotting function (plot.MoEClust
) exists for visualising the results using generalised pairs plots, for examining the gating network, and/or log-likelihood, and/or clustering uncertainties, and/or similarity matrix, and/or graphing model selection criteria values. The generalised pairs plots (MoE_gpairs
) visualise all pairwise relationships between clustered response variables and associated continuous, categorical, and/or ordinal covariates in the gating &/or expert networks, coloured according to the MAP classification, and also give the marginal distributions of each variable (incl. the covariates) along the diagonal.
An as.Mclust
method is provided to coerce the output of class "MoEClust"
from MoE_clust
to the "Mclust"
class, to facilitate use of plotting and other functions for the "Mclust"
class within the mclust package. As per mclust, MoEClust also facilitates modelling with an additional noise component (with or without the mixing proportion for the noise component depending on covariates).
Finally, a predict
method is provided for predicting the fitted response and probability of cluster membership (and by extension the MAP classification) for new data, in the form of new covariates and new response data, or new covariates only.
Other functions also exist, e.g. MoE_crit
, MoE_dens
, MoE_estep
, MoE_compare
, and aitken
, which are all used within MoE_clust
but are nonetheless made available for standalone use.
The package also contains two data sets: ais
and CO2data
.
Author(s)
Keefe Murphy [aut, cre], Thomas Brendan Murphy [ctb]
Maintainer: Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <doi:10.1007/s11634-019-00373-8>.
See Also
Useful links:
Further details and examples are given in the associated vignette document:
vignette("MoEClust", package = "MoEClust")
Examples
data(ais)
# Fit two sets of models
res1 <- MoE_clust(ais[,3:7], G=2, gating= ~ BMI, expert= ~ sex,
modelNames=c("VEE", "EVE", "VVE"), network.data=ais)
res2 <- MoE_clust(ais[,3:7], G=2, equalPro=TRUE, expert= ~ sex,
modelNames=c("VEE", "EVE", "VVE"), network.data=ais)
# Compare the best model from each set of results
(comp <- MoE_compare(res1, res2, optimal.only=TRUE))
# Produce a plot for the optimal model
plot(comp$optimal, what="gpairs", show.dens=TRUE)
# Summarise its classification table, component parameters, and gating/expert networks
summary(comp$optimal, classification=TRUE, parameters=TRUE, networks=TRUE)
data(CO2data)
CO2 <- CO2data$CO2
GNP <- CO2data$GNP
# Fit a range of models
m1 <- MoE_clust(CO2, G=1:3)
m2 <- MoE_clust(CO2, G=2:3, gating= ~ GNP)
m3 <- MoE_clust(CO2, G=1:3, expert= ~ GNP)
m4 <- MoE_clust(CO2, G=2:3, gating= ~ GNP, expert= ~ GNP)
m5 <- MoE_clust(CO2, G=2:3, equalPro=TRUE)
m6 <- MoE_clust(CO2, G=2:3, expert= ~ GNP, equalPro=TRUE)
# Extract the model with highest BIC
(comp <- MoE_compare(m1, m2, m3, m4, m5, m6, criterion="bic"))
# See if a better model can be found using greedy forward stepwise selection
# Conduct a stepwise search on the same data
(mod1 <- MoE_stepwise(CO2, CO2data[,"GNP", drop=FALSE]))
# Conduct another stepwise search considering models with a noise component
(mod2 <- MoE_stepwise(CO2, CO2data[,"GNP", drop=FALSE], noise=TRUE))
# Compare all sets of results to choose the optimal model
(best <- MoE_compare(mod1, mod2, comp, pick=1)$optimal)
GNP and CO2 Data Set
Description
This data set gives the gross national product (GNP) per capita in 1996 for various countries as well as their estimated carbon dioxide (CO2) emission per capita for the same year.
Usage
data(CO2data)
Format
This data frame consists of 28 countries and the following variables:
GNP
The gross product per capita in 1996.
CO2
The estimated carbon dioxide emission per capita in 1996.
country
An abbreviation pertaining to the country measures (e.g.
"GRC"
= Greece and"CH"
= Switzerland).
References
Hurn, M., Justel, A. and Robert, C. P. (2003) Estimating mixtures of regressions, Journal of Computational and Graphical Statistics, 12(1): 55-79.
Examples
data(CO2data, package="MoEClust")
plot(CO2data$GNP, CO2data$CO2, type="n", ylab=expression('CO'[2]))
text(CO2data$GNP, CO2data$CO2, CO2data$country)
Compute the Frobenius (adjusted) Rand index
Description
This function efficiently computes fuzzy generalisations of the Rand and adjusted Rand indices for comparing two partitions, allowing either or both partitions to be “soft” or “hard”.
Usage
FARI(z1,
z2)
Arguments
z1 , z2 |
A |
Details
If z1
&/or z2
is supplied as a vector of cluster labels, they will be coerced to an appropriate matrix via unmap
.
Value
A list with the following named components:
FRI
Measure of Frobenius Rand index between
z1
andz2
.FARI
Measure of Frobenius adjusted Rand index between
z1
andz2
.
Note
The number of columns of the matrices z1
and z2
need not be equal.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Andrew, J. L., Browne, R., and Hvingelby, C. D. (2022). On assessments of agreement between fuzzy partitions. Journal of Classification, 39(2): 326-342.
See Also
Examples
m1 <- MoE_clust(ais[,3:7], G=2, modelNames="EVE",
gating=~BMI, expert=~sex, network.data=ais)
m2 <- MoE_clust(ais[,3:7], G=2, modelNames="EVE",
equalPro=TRUE, expert=~sex, network.data=ais)
m3 <- MoE_clust(ais[,3:7], G=2, modelNames="VEE", algo="CEM", tau0=0.1)
# FARI between two soft partitions
FARI(m1$z, m2$z)
# FARI between soft and hard partitions
FARI(m1$z, m3$z)
# FARI between soft partition and hard classification
FARI(m1$z, m2$classification)
# FARI between hard partition and hard classification
FARI(m3$z, m3$classification)
# FARI between hard classification and hard classification
FARI(m1$classification, m2$classification)
Average posterior probabilities of a fitted MoEClust model
Description
Calculates the per-component average posterior probabilities of a fitted MoEClust model.
Usage
MoE_AvePP(x,
group = TRUE)
Arguments
x |
An object of class |
group |
A logical indicating whether the average posterior probabilities should be computed per component. Defaults to |
Details
When group=TRUE
, this function calculates AvePP, the average posterior probabilities of membership for each component for the observations assigned to that component via MAP probabilities. Otherwise, an overall measure of clustering certainty is returned.
Value
When group=TRUE
, a named vector of numbers, of length equal to the number of components (G), in the range [1/G,1], such that larger values indicate clearer separation of the clusters. Note that G=x$G
for models without a noise component and G=x$G + 1
for models with a noise component. When group=FALSE
, a single number in the same range is returned.
Note
This function will always return values of 1
for all components for models fitted using the "CEM"
algorithm (see MoE_control
), or models with only one component.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <doi:10.1007/s11634-019-00373-8>.
See Also
MoE_clust
, MoE_control
, MoE_entropy
Examples
data(ais)
res <- MoE_clust(ais[,3:7], G=3, gating= ~ BMI + sex,
modelNames="EEE", network.data=ais)
# Calculate the AvePP per component
MoE_AvePP(res)
# Calculate an overall measure of clustering certainty
MoE_AvePP(res, group=FALSE)
Plot the Similarity Matrix of a MoEClust Mixture Model
Description
Produces a heatmap of the similarity matrix constructed from the res$z
matrix at convergence of a MoEClust mixture model.
Usage
MoE_Similarity(res,
col = grDevices::heat.colors(30L, rev=TRUE),
reorder = TRUE,
legend = TRUE,
...)
Arguments
res |
An object of class |
col |
A vector of colours as per |
reorder |
A logical (defaults to |
legend |
A logical (defaults to |
... |
Catches unused arguments, or arguments to be passed to |
Value
The similarity matrix in the form of a heatmap is plotted; the matrix itself can also be returned invisibly. The invisibly returned matrix will also be reordered if reordered=TRUE
.
Note
plot.MoEClust
is a wrapper to MoE_Similarity
which accepts the default arguments, and also produces other types of plots.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
Examples
data(ais)
mod <- MoE_clust(ais[,3:7], G=2, modelNames="EEE", gating= ~ SSF + Ht,
expert= ~ sex, network.data=ais, tau0=0.1, noise.gate=FALSE)
sim <- MoE_Similarity(mod)
Plot Clustering Uncertainties
Description
Plots the clustering uncertainty for every observation from a fitted "MoEClust"
model, including models with a noise component.
Usage
MoE_Uncertainty(res,
type = c("barplot", "profile"),
truth = NULL,
decreasing = FALSE,
col = c("cluster", "uncertain", "none"),
rug1d = TRUE,
...)
Arguments
res |
An object of class |
type |
The type of plot to be produced (defaults to |
truth |
An optional argument giving the true classification of the data. When |
decreasing |
A logical indicating whether uncertainties should be ordered in decreasing order (defaults to |
col |
Only relevant when |
rug1d |
A logical which is relevant only when |
... |
Catches unused arguments. |
Details
The y-axis of this plot runs from 0
to 1 - 1/res$G
, with a horizontal line also drawn at 1/res$G
. When type="barplot"
, uncertainties greater than this value are given a different colour when truth
is not supplied, otherwise misclassified observations are given a different colour. Note, however, that G^{(0)}
= res$G + 1
is used in place of res$G
for models with a noise component.
Value
A plot showing the clustering uncertainty of each observation (sorted in increasing/decreasing order when type="profile"
). The (unsorted) vector of uncertainties can also be returned invisibly. When truth
is supplied, the indices of the misclassified observations are also invisibly returned.
Note
plot.MoEClust
is a wrapper to MoE_Uncertainty
which accepts the default arguments, and also produces other types of plots.
An alternative means of visualising clustering uncertainties (at least for multivariate data) is provided by the functions MoE_gpairs
and plot.MoEClust
, specifically when their argument response.type
is given as "uncertainty"
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
MoE_clust
, MoE_gpairs
, plot.MoEClust
Examples
data(ais)
res <- MoE_clust(ais[,3:7], gating= ~ sex, G=3, modelNames="EEE", network.data=ais)
# Produce an uncertainty barplot
MoE_Uncertainty(res)
# Change the colour scheme
MoE_Uncertainty(res, col="uncertain")
# Produce an uncertainty profile plot
MoE_Uncertainty(res, type="profile")
# Let's assume the true clusters correspond to sex
(ub <- MoE_Uncertainty(res, truth=ais$sex))
(up <- MoE_Uncertainty(res, type="profile", truth=ais$sex))
# Examine the effect of rug1d for univariate models
mod <- MoE_clust(CO2data$CO2, expert=~GNP, G=2, modelNames="V", network.data=CO2data)
MoE_Uncertainty(mod, rug1d=FALSE)
MoE_Uncertainty(mod)
MoEClust: Gaussian Parsimonious Clustering Models with Covariates and a Noise Component
Description
Fits MoEClust models: Gaussian Mixture of Experts models with GPCM/mclust-family covariance structures. In other words, performs model-based clustering via the EM/CEM algorithm where covariates are allowed to enter neither, either, or both the mixing proportions (gating network) and/or component densities (expert network) of a Gaussian Parsimonious Clustering Model, with or without an additional noise component. Additional arguments are available via the function MoE_control
, including the specification of a noise component, controls on the initialisation of the algorithm, and more.
Usage
MoE_clust(data,
G = 1:9,
modelNames = NULL,
gating = ~1,
expert = ~1,
control = MoE_control(...),
network.data = NULL,
...)
## S3 method for class 'MoEClust'
print(x,
digits = 3L,
...)
## S3 method for class 'MoEClust'
summary(object,
classification = TRUE,
parameters = FALSE,
networks = FALSE,
...)
Arguments
data |
A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations and columns correspond to variables. | ||||||||||||
G |
An integer vector specifying the number(s) of mixture components (clusters) to fit. Defaults to | ||||||||||||
modelNames |
A vector of character strings indicating the models to be fitted in the EM/CEM phase of clustering. With
For single-component models these options reduce to:
For zero-component models with a noise component only the | ||||||||||||
gating |
A | ||||||||||||
expert |
A | ||||||||||||
control |
A list of control parameters for the EM/CEM and other aspects of the algorithm. The defaults are set by a call to | ||||||||||||
network.data |
An optional data frame (or a matrix with named columns) in which to look for the covariates in the | ||||||||||||
... |
An alternative means of passing control parameters directly via the named arguments of | ||||||||||||
x , object , digits , classification , parameters , networks |
Arguments required for the |
Details
The function effectively allows 6 different types of Gaussian Mixture of Experts model (as well as the different models in the GPCM/mclust family, for each): i) the standard finite Gaussian mixture with no covariates, ii) fixed covariates only in the gating network, iii) fixed covariates only in the expert network, iv) the full Mixture of Experts model with fixed covariates entering both the mixing proportions and component densities. By constraining the mixing proportions to be equal (see equalPro
in MoE_control
) two extra special cases are facilitated when gating covariates are excluded.
Note that having the same covariates in both networks is allowed. So too are interactions, transformations, and higher order terms (see formula
): the latter must be specified explicitly using the AsIs
operator (I
). Covariates can be continuous, categorical, logical, or ordinal, but the response must always be continuous.
While model selection in terms of choosing the optimal number of components and the GPCM/mclust model type is performed within MoE_clust
, using one of the criterion
options within MoE_control
, choosing between multiple fits with different combinations of covariates or different initialisation settings can be done by supplying objects of class "MoEClust"
to MoE_compare
.
Value
A list (of class "MoEClust"
) with the following named entries, mostly corresponding to the chosen optimal model (as determined by the criterion
within MoE_control
):
call |
The matched call. |
data |
The input data, as a |
modelName |
A character string denoting the GPCM/mclust model type at which the optimal |
n |
The number of observations in the |
d |
The dimension of the |
G |
The optimal number of mixture components according to |
BIC |
A matrix of all BIC values with |
ICL |
A matrix of all ICL values with |
AIC |
A matrix of all AIC values with |
bic |
The BIC value corresponding to the optimal model. May not necessarily be the optimal BIC. |
icl |
The ICL value corresponding to the optimal model. May not necessarily be the optimal ICL. |
aic |
The AIC value corresponding to the optimal model. May not necessarily be the optimal AIC. |
gating |
An object of class |
expert |
An object of class |
LOGLIK |
A matrix of all maximal log-likelihood values with |
loglik |
The vector of increasing log-likelihood values for every EM/CEM iteration under the optimal model. The last element of this vector is the maximum log-likelihood achieved by the parameters returned at convergence. |
linf |
An asymptotic estimate of the final converged maximised log-likelihood. Returned when |
df |
The number of estimated parameters in the optimal model (i.e. the number of ‘used’ degrees of freedom). Subtract this number from |
iters |
The total number of EM/CEM iterations for the optimal model. |
hypvol |
The hypervolume parameter for the noise component if required, otherwise set to |
parameters |
A list with the following named components:
|
z |
The final responsibility matrix whose |
classification |
The vector of cluster labels for the chosen model corresponding to |
uncertainty |
The uncertainty associated with the |
net.covs |
A data frame gathering the unique set of covariates used in the |
resid.data |
In the presence of expert network covariates, this is the augmented data actually used in the clustering at convergence, as a list of |
DF |
A matrix giving the numbers of estimated parameters (i.e. the number of ‘used’ degrees of freedom) for all visited models, with |
ITERS |
A matrix giving the total number of EM/CEM iterations for all visited models, with |
Dedicated plot
, predict
, print
, and summary
functions exist for objects of class "MoEClust"
. The results can be coerced to the "Mclust"
class to access other functions from the mclust package via as.Mclust
.
Note
Where BIC
, ICL
, AIC
, LOGLIK
, DF
, and ITERS
contain NA
entries, this corresponds to a model which was not run; for instance a VVV model is never run for single-component models as it is equivalent to EEE. As such, one can consider the value as not really missing, but equivalent to the EEE value. BIC
, ICL
, AIC
, LOGLIK
, DF
, and ITERS
all inherit the classes "MoECriterion"
and "mclustBIC"
, "mclustICL"
, etc., for which dedicated print
, summary
, and plot
methods exist.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <doi:10.1007/s11634-019-00373-8>.
Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458): 611-631.
See Also
See MoE_stepwise
for identifying the optimal model and its covariates via greedy forward stepwise selection.
MoE_control
, MoE_compare
, plot.MoEClust
, predict.MoEClust
, predict.MoE_gating
, predict.MoE_expert
, as.Mclust
, MoE_crit
, MoE_estep
, MoE_cstep
, MoE_dens
, mclustModelNames
, mclustVariance
, expert_covar
, aitken
, I
Examples
data(ais)
hema <- ais[,3:7]
sex <- ais$sex
BMI <- ais$BMI
# Fit a standard finite mixture model
m1 <- MoE_clust(hema, G=2:3)
# Allow covariates to enter the mixing proportions
m2 <- MoE_clust(hema, G=2:3, gating= ~ sex + BMI)
# Allow covariates to enter the component densities
m3 <- MoE_clust(hema, G=2:3, expert= ~ sex)
# Allow covariates to enter both the gating & expert network
m4 <- MoE_clust(hema, G=2:3, gating= ~ BMI, expert= ~ sex)
# Fit an equal mixing proportion model with an expert network covariate
m5 <- MoE_clust(hema, G=2:3, expert= ~ sex + BMI, equalPro=TRUE)
# Fit models with gating covariates & an additional noise component
m6 <- MoE_clust(hema, G=2:3, tau0=0.1, gating= ~ BMI, network.data=ais)
# Extract the model with highest BIC
(comp <- MoE_compare(m1, m2, m3, m4, m5, m6, criterion="bic"))
# See if a better model can be found using greedy forward stepwise selection
(step <- MoE_stepwise(ais[,3:7], ais))
(comp <- MoE_compare(comp, step, optimal.only=TRUE))
(best <- comp$optimal)
(summ <- summary(best, classification=TRUE, parameters=TRUE, networks=TRUE))
# Examine the expert network in greater detail
# (but refrain from inferring statistical significance!)
summary(best$expert)
# Visualise the results, incl. the gating network and log-likelihood
plot(best, what="gpairs", show.dens=TRUE)
plot(best, what="gating") # equal mixing proportions!
plot(best, what="loglik")
# Visualise the results using the 'lattice' package
z <- factor(best$classification, labels=paste0("Cluster", seq_len(best$G)))
lattice::splom(~ hema | sex, groups=z)
lattice::splom(~ hema | z, groups=sex)
Choose the best MoEClust model
Description
Takes one or more sets of MoEClust models fitted by MoE_clust
(or MoE_stepwise
) and ranks them according to the BIC, ICL, or AIC. It's possible to respect the internal ranking within each set of models, or to discard models within each set which were already deemed sub-optimal. This function can help with model selection via exhaustive or stepwise searches.
Usage
MoE_compare(...,
criterion = c("bic", "icl", "aic"),
pick = 10L,
optimal.only = FALSE)
## S3 method for class 'MoECompare'
print(x,
index = seq_len(x$pick),
posidens = TRUE,
rerank = FALSE,
digits = 3L,
details = TRUE,
maxi = length(index),
...)
Arguments
... |
One or more objects of class This argument is only relevant for the |
criterion |
The criterion used to determine the ranking. Defaults to |
pick |
The (integer) number of models to be ranked and compared. Defaults to |
optimal.only |
Logical indicating whether to only rank models already deemed optimal within each |
x , index , posidens , rerank , digits , details , maxi |
Arguments required for the associated
|
Details
The purpose of this function is to conduct model selection on "MoEClust"
objects, fit to the same data set, with different combinations of gating/expert network covariates or different initialisation settings.
Model selection will have already been performed in terms of choosing the optimal number of components and GPCM/mclust model type within each supplied set of results, but MoE_compare
will respect the internal ranking of models when producing the final ranking if optimal.only
is FALSE
: otherwise only those models already deemed optimal within each "MoEClust"
object will be ranked.
As such if two sets of results are supplied when optimal.only
is FALSE
, the 1st, 2nd, and 3rd best models could all belong to the first set of results, meaning a model deemed suboptimal according to one set of covariates could be superior to one deemed optimal under another set of covariates.
Value
A list of class "MoECompare"
, for which a dedicated print function exists, containing the following elements (each of length pick
, and ranked according to criterion
, where appropriate):
data |
The name of the data set to which the models were fitted. |
optimal |
The single optimal model (an object of class |
pick |
The final number of ranked models. May be different (i.e. less than) the supplied |
MoENames |
The names of the supplied |
modelNames |
The |
G |
The optimal numbers of components. |
df |
The numbers of estimated parameters. |
iters |
The numbers of EM/CEM iterations. |
bic |
BIC values, ranked according to |
icl |
ICL values, ranked according to |
aic |
AIC values, ranked according to |
loglik |
Maximal log-likelihood values. |
gating |
The gating formulas. |
expert |
The expert formulas. |
algo |
The algorithm used for fitting the model - either |
equalPro |
Logical indicating whether mixing proportions were constrained to be equal across components. |
hypvol |
Hypervolume parameters for the noise component if relevant, otherwise set to |
noise |
The type of noise component fitted (if any). Only displayed if at least one of the compared models has a noise component. |
noise.gate |
Logical indicating whether gating covariates were allowed to influence the noise component's mixing proportion. Only printed for models with a noise component, when at least one of the compared models has gating covariates. |
equalNoise |
Logical indicating whether the mixing proportion of the noise component for |
Note
The criterion
argument here need not comply with the criterion used for model selection within each "MoEClust"
object, but be aware that a mismatch in terms of criterion
may require the optimal model to be re-fit in order to be extracted, thereby slowing down MoE_compare
.
If random starts had been used via init.z="random.hard"
or init.z="soft.random"
, the optimal
model may not necessarily correspond to the highest-ranking model in the presence of a criterion mismatch, due to the randomness of the initialisation.
A dedicated print
function exists for objects of class "MoECompare"
.
plot.MoEClust
and as.Mclust
can both also be called on objects of class "MoECompare"
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <doi:10.1007/s11634-019-00373-8>.
See Also
See MoE_stepwise
for identifying the optimal model and its covariates via greedy forward stepwise selection.
MoE_clust
, mclustModelNames
, plot.MoEClust
, as.Mclust
Examples
data(CO2data)
CO2 <- CO2data$CO2
GNP <- CO2data$GNP
# Fit a range of models
m1 <- MoE_clust(CO2, G=1:3)
m2 <- MoE_clust(CO2, G=2:3, gating= ~ GNP)
m3 <- MoE_clust(CO2, G=1:3, expert= ~ GNP)
m4 <- MoE_clust(CO2, G=2:3, gating= ~ GNP, expert= ~ GNP)
m5 <- MoE_clust(CO2, G=2:3, equalPro=TRUE)
m6 <- MoE_clust(CO2, G=2:3, expert= ~ GNP, equalPro=TRUE)
m7 <- MoE_clust(CO2, G=2:3, expert= ~ GNP, tau0=0.1)
# Rank only the optimal models and examine the best model
(comp <- MoE_compare(m1, m2, m3, m4, m5, m6, m7, optimal.only=TRUE))
(best <- comp$optimal)
(summ <- summary(best, classification=TRUE, parameters=TRUE, networks=TRUE))
# Examine all models visited, including those already deemed suboptimal
# Only print models with expert covariates & more than one component
comp2 <- MoE_compare(m1, m2, m3, m4, m5, m6, m7, pick=Inf)
print(comp2, index=comp2$expert != "None" & comp2$G > 1)
# Conduct a stepwise search on the same data
(mod1 <- MoE_stepwise(CO2, GNP))
# Conduct another stepwise search considering models with a noise component
(mod2 <- MoE_stepwise(CO2, GNP, noise=TRUE))
# Compare both sets of results to choose the optimal model
(best <- MoE_compare(mod1, mod2, optimal.only=TRUE)$optimal)
Set control values for use with MoEClust
Description
Supplies a list of arguments (with defaults) for use with MoE_clust
.
Usage
MoE_control(init.z = c("hc", "quantile", "kmeans", "mclust",
"random.hard", "soft.random", "list"),
noise.args = list(...),
asMclust = FALSE,
equalPro = FALSE,
exp.init = list(...),
algo = c("EM", "CEM", "cemEM"),
criterion = c("bic", "icl", "aic"),
stopping = c("aitken", "relative"),
z.list = NULL,
nstarts = 1L,
eps = .Machine$double.eps,
tol = c(1e-05, sqrt(.Machine$double.eps), 1e-08),
itmax = c(.Machine$integer.max, .Machine$integer.max, 1000L),
hc.args = list(...),
km.args = list(...),
posidens = TRUE,
init.crit = c("bic", "icl"),
warn.it = 0L,
MaxNWts = 1000L,
verbose = interactive(),
...)
Arguments
init.z |
The method used to initialise the cluster labels for the non-noise components. Defaults to Other options include When When Finally, when the model includes expert network covariates and |
noise.args |
A list supplying select named parameters to control inclusion of a noise component in the estimation of the mixture. If either or both of the arguments
In particular, the argument The arguments |
asMclust |
The default values of
|
equalPro |
Logical variable indicating whether or not the mixing proportions are to be constrained to be equal in the model. Default: |
exp.init |
A list supplying select named parameters to control the initialisation routine in the presence of expert network covariates (otherwise ignored):
|
algo |
Switch controlling whether models are fit using the |
criterion |
When either |
stopping |
The criterion used to assess convergence of the EM/CEM algorithm. The default ( Both stopping rules are ultimately governed by |
z.list |
A user supplied list of initial cluster allocation matrices, with number of rows given by the number of observations, and numbers of columns given by the range of component numbers being considered. In particular, |
nstarts |
The number of random initialisations to use when Conversely, if |
eps |
A scalar tolerance associated with deciding when to terminate computations due to computational singularity in covariances. Smaller values of |
tol |
A vector of length three giving relative convergence tolerances for 1) the log-likelihood of the EM/CEM algorithm, 2) parameter convergence in the inner loop for models with iterative M-step ( |
itmax |
A vector of length three giving integer limits on the number of iterations for 1) the EM/CEM algorithm, 2) the inner loop for models with iterative M-step ( The default is If, for any model with gating covariates, the multinomial logistic regression in the gating network fails to converge in |
hc.args |
A list supplying select named parameters to control the initialisation of the cluster allocations when
|
km.args |
A list supplying select named parameters to control the initialisation of the cluster allocations when
|
posidens |
A logical governing whether to continue running the algorithm even in the presence of positive log-densities. Defaults to |
init.crit |
The criterion to be used to determine the optimal model type to initialise with, when |
warn.it |
A single number giving the iteration count at which a warning will be printed if the EM/CEM algorithm has failed to converge. Defaults to |
MaxNWts |
The maximum allowable number of weights in the call to |
verbose |
Logical indicating whether to print messages pertaining to progress to the screen during fitting. By default is |
... |
Catches unused arguments. |
Details
MoE_control
is provided for assigning values and defaults within MoE_clust
and MoE_stepwise
.
While the criterion
argument controls the choice of the optimal number of components and GPCM/mclust model type, MoE_compare
is provided for choosing between fits with different combinations of covariates or different initialisation settings.
Value
A named list in which the names are the names of the arguments and the values are the values supplied to the arguments.
Note
Note that successfully invoking exp.init$clustMD
(though it defaults to FALSE
) affects the role of the arguments init.z
, hc.args
, and km.args
. Please read the documentation above carefully in this instance.
The initial allocation matrices before and after the invocation of the exp.init
related arguments are both stored as attributes in the object returned by MoE_clust
(named "Z.init"
and "Exp.init"
, respectively). If init.z="random.hard"
or init.z="soft.random"
and nstarts > 1
, the allocations corresponding to the best random start are stored (regardless of whether exp.init$estart
is invoked or not). This can be useful for supplying z.list
for future fits.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
MoE_clust
, MoE_stepwise
, aitken
, Mclust
, hc
, mclust.options
, quant_clust
, clustMD
, noise_vol
, hypvol
, convhulln
, ellipsoidhull
, MoE_compare
, multinom
Examples
ctrl1 <- MoE_control(criterion="icl", itmax=100, warn.it=15, init.z="random.hard", nstarts=5)
data(CO2data)
GNP <- CO2data$GNP
res <- MoE_clust(CO2data$CO2, G=2, expert = ~ GNP, control=ctrl1)
# Alternatively, specify control arguments directly
res2 <- MoE_clust(CO2data$CO2, G=2, expert = ~ GNP, stopping="relative")
# Supplying ctrl1 without naming it as 'control' can throw an error
## Not run:
res3 <- MoE_clust(CO2data$CO2, G=2, expert = ~ GNP, ctrl1)
## End(Not run)
# Similarly, supplying control arguments via a mix of the ... construct
# and the named argument 'control' also throws an error
## Not run:
res4 <- MoE_clust(CO2data$CO2, G=2, expert = ~ GNP, control=ctrl1, init.z="kmeans")
## End(Not run)
# Initialise via the mixed-type joint distribution of response & covariates
# Let the ICL criterion determine the optimal clustMD model type
# Constrain the mixing proportions to be equal
ctrl2 <- MoE_control(exp.init=list(clustMD=TRUE), init.crit="icl", equalPro=TRUE)
data(ais)
library(clustMD)
res4 <- MoE_clust(ais[,3:7], G=2, modelNames="EVE", expert= ~ sex,
network.data=ais, control=ctrl2)
# Include a noise component by specifying its prior mixing proportion
res5 <- MoE_clust(ais[,3:7], G=2, modelNames="EVE", expert= ~ sex,
network.data=ais, tau0=0.1)
# Include a noise component via an initial guess of which observations are noise
mdist <- mahalanobis(ais[,3:7], colMeans(ais[,3:7]), cov(ais[,3:7]))
cutp <- qchisq(p=0.95, df=ncol(ais[,3:7]))
res6 <- MoE_clust(ais[,3:7], G=2, modelNames="EVE", expert= ~ sex,
network.data=ais, noise.init=mdist > cutp)
# Include a noise component by specifying tau0 as a vector
res7 <- MoE_clust(ais[,3:7], G=2, modelNames="EVE", expert= ~ sex,
network.data=ais, tau0=pchisq(mdist, df=ncol(ais[,3:7])))
# Investigate the use of random starts
sex <- ais$sex
# resA uses deterministic starting values (by default) for each G value
system.time(resA <- MoE_clust(ais[,3:7], G=2, expert=~sex, equalPro=TRUE))
# resB passes each random start through the entire EM algorithm for each G value
system.time(resB <- MoE_clust(ais[,3:7], G=2, expert=~sex, equalPro=TRUE,
init.z="random.hard", nstarts=10))
# resC passes only the "best" random start through the EM algorithm for each G value
# this time, we also use soft rather than hard random starts
system.time(resC <- MoE_clust(ais[,3:7], G=2, expert=~sex, equalPro=TRUE,
init.z="soft.random", nstarts=10, estart=TRUE))
# Here, all three settings (given here in order of speed) identify & converge to the same model
MoE_compare(resA, resC, resB)
MoEClust BIC, ICL, and AIC Model-Selection Criteria
Description
Computes the BIC (Bayesian Information Criterion), ICL (Integrated Complete Likelihood), and AIC (Akaike Information Criterion) for parsimonious mixture of experts models given the log-likelihood, the dimension of the data, the number of mixture components in the model, the numbers of parameters in the gating and expert networks respectively, and, for the ICL, the numbers of observations in each component.
Usage
MoE_crit(modelName,
loglik,
n,
d,
G,
gating.pen = G - 1L,
expert.pen = G * d,
z = NULL,
df = NULL)
Arguments
modelName |
A character string indicating the model. The help file for |
loglik |
The log-likelihood for a data set with respect to the Gaussian mixture model specified in the |
n , d , G |
The number of observations in the data, dimension of the data, and number of components in the Gaussian mixture model, respectively, used to compute |
gating.pen |
The number of parameters of the gating network of the MoEClust model. Defaults to |
expert.pen |
The number of parameters of the expert network of the MoEClust model. Defaults to |
z |
The |
df |
An alternative way to specify the number of estimated parameters (or ‘used’ degrees of freedom) exactly. If supplied, the arguments |
Details
The function is vectorised with respect to the arguments modelName
and loglik
.
If model
is an object of class "MoEClust"
with G
components, the number of parameters for the gating.pen
and expert.pen
are length(coef(model$gating))
and G * length(coef(model$expert[[1]]))
, respectively.
Models with a noise component are facilitated here too, provided the extra number of parameters are accounted for by the user.
Value
A simplified array containing the BIC, AIC, number of estimated parameters (df
) and, if z
is supplied, also the ICL, for each of the given input arguments.
Note
In order to speed up repeated calls to the function inside MoE_clust
, no checks take place.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Biernacki, C., Celeux, G. and Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7): 719-725.
See Also
MoE_clust
, nVarParams
, mclustModelNames
Examples
MoE_crit(modelName=c("VVI", "VVE", "VVV"), n=120, d=8,
G=3, loglik=c(-4036.99, -3987.12, -3992.45))
data(CO2data)
GNP <- CO2data$GNP
model <- MoE_clust(CO2data$CO2, G=1:2, expert= ~ GNP)
G <- model$G
name <- model$modelName
ll <- max(model$loglik)
n <- length(CO2data$CO2)
z <- model$z
# Compare BIC from MoE_crit to the BIC of the model
(bic2 <- MoE_crit(modelName=name, loglik=ll, n=n, d=1, G=G, z=z,
expert.pen=G * length(coef(model$expert[[1]])))["bic",])
identical(unname(bic2), model$bic) #TRUE
# Make the same comparison with the known number of estimated parameters
(bic3 <- MoE_crit(loglik=ll, n=n, df=model$df, z=z)["bic",])
identical(unname(bic3), bic2) #TRUE
C-step for MoEClust Models
Description
Function to compute the assignment matrix z and the conditional log-likelihood for MoEClust models, with the aid of MoE_dens
.
Usage
MoE_cstep(data,
mus,
sigs,
log.tau = 0L,
Vinv = NULL,
Dens = NULL)
Arguments
data |
If there are no expert network covariates, |
mus |
The mean for each of G components. If there is more than one component, this is a matrix whose k-th column is the mean of the k-th component of the mixture model. For the univariate models, this is a G-vector of means. In the presence of expert network covariates, all values should be equal to |
sigs |
The |
log.tau |
If covariates enter the gating network, an n times G matrix of mixing proportions, otherwise a G-vector of mixing proportions for the components of the mixture. Must be on the log-scale in both cases. The default of |
Vinv |
An estimate of the reciprocal hypervolume of the data region. See the function |
Dens |
(Optional) A numeric matrix whose |
Value
A list containing two elements:
z |
A matrix with |
loglik |
The estimated conditional log-likelihood. |
Note
This function is intended for joint use with MoE_dens
, using the log-densities. Caution is advised using this function without explicitly naming the arguments. Models with a noise component are facilitated here too.
The C-step can be replaced by an E-step, see MoE_estep
and the algo
argument to MoE_control
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
MoE_dens
, MoE_clust
, MoE_estep
, MoE_control
, mclustVariance
Examples
# MoE_cstep can be invoked for fitting MoEClust models via the CEM algorithm
# via the 'algo' argument to MoE_control:
data(ais)
hema <- ais[,3:7]
model <- MoE_clust(hema, G=3, gating= ~ BMI + sex, modelNames="EEE", network.data=ais, algo="CEM")
Dens <- MoE_dens(data=hema, mus=model$parameters$mean,
sigs=model$parameters$variance, log.tau=log(model$parameters$pro))
# Construct the z matrix and compute the conditional log-likelihood
Cstep <- MoE_cstep(Dens=Dens)
(ll <- Cstep$loglik)
# Check that the z matrix & classification are the same as those from the model
identical(max.col(Cstep$z), as.integer(unname(model$classification))) #TRUE
identical(Cstep$z, model$z) #TRUE
# Call MoE_cstep directly
Cstep2 <- MoE_cstep(data=hema, sigs=model$parameters$variance,
mus=model$parameters$mean, log.tau=log(model$parameters$pro))
identical(Cstep2$loglik, ll) #TRUE
Density for MoEClust Mixture Models
Description
Computes densities (or log-densities) of observations in MoEClust mixture models.
Usage
MoE_dens(data,
mus,
sigs,
log.tau = 0L,
Vinv = NULL,
logarithm = TRUE)
Arguments
data |
If there are no expert network covariates, |
mus |
The mean for each of G components. If there is more than one component, this is a matrix whose k-th column is the mean of the k-th component of the mixture model. For the univariate models, this is a G-vector of means. In the presence of expert network covariates, all values should be equal to |
sigs |
The |
log.tau |
If covariates enter the gating network, an n times G matrix of mixing proportions, otherwise a G-vector of mixing proportions for the components of the mixture. Must be on the log-scale in both cases. The default of |
Vinv |
An estimate of the reciprocal hypervolume of the data region. See the function |
logarithm |
A logical value indicating whether or not the logarithm of the component densities should be returned. This defaults to |
Value
A numeric matrix whose [i,k]
-th entry is the density or log-density of observation i in component k, scaled by the mixing proportions. These densities are unnormalised.
Note
This function is intended for joint use with MoE_estep
or MoE_cstep
, using the log-densities. Note that models with a noise component are facilitated here too.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
MoE_estep
, MoE_cstep
, MoE_clust
, mclustVariance
Examples
data(ais)
hema <- ais[,3:7]
model <- MoE_clust(hema, G=3, gating= ~ BMI + sex, modelNames="EEE", network.data=ais)
Dens <- MoE_dens(data=hema, mus=model$parameters$mean,
sigs=model$parameters$variance, log.tau=log(model$parameters$pro))
# Construct the z matrix and compute the log-likelihood
Estep <- MoE_estep(Dens=Dens)
(ll <- Estep$loglik)
# Check that the z matrix & classification are the same as those from the model
identical(max.col(Estep$z), as.integer(unname(model$classification))) #TRUE
identical(Estep$z, model$z) #TRUE
# The same can be done for models with expert covariates &/or a noise component
# Note for models with expert covariates that the mean has to be supplied as 0,
# and the data has to be supplied as "resid.data"
m2 <- MoE_clust(hema, G=2, expert= ~ sex, modelNames="EVE", network.data=ais, tau0=0.1)
Dens2 <- MoE_dens(data=m2$resid.data, sigs=m2$parameters$variance, mus=0,
log.tau=log(m2$parameters$pro), Vinv=m2$parameters$Vinv)
Entropy of a fitted MoEClust model
Description
Calculates the normalised entropy of a fitted MoEClust model.
Usage
MoE_entropy(x,
group = FALSE)
Arguments
x |
An object of class |
group |
A logical (defaults to |
Details
When group
is FALSE
, this function calculates the normalised entropy via
H=-\frac{1}{n\log(G)}\sum_{i=1}^n\sum_{g=1}^G\hat{z}_{ig}\log(\hat{z}_{ig})
,
where n
and G
are the sample size and number of components, respectively, and \hat{z}_{ig}
is the estimated posterior probability at convergence that observation i
belongs to component g
. Note that G=x$G
for models without a noise component and G=x$G + 1
for models with a noise component.
When group
is TRUE
,
H_i=-\frac{1}{\log(G)}\sum_{g=1}^G\hat{z}_{ig}\log(\hat{z}_{ig})
is computed for each observation and averaged according to the MAP classification.
Value
When group
is FALSE
, a single number, given by 1-H
, in the range [0,1], such that larger values indicate clearer separation of the clusters. Otherwise, a vector of length G
containing the per-component averages of the observation-specific entries is returned.
Note
This function will always return a normalised entropy of 1
for models fitted using the "CEM"
algorithm (see MoE_control
), or models with only one component.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <doi:10.1007/s11634-019-00373-8>.
See Also
MoE_clust
, MoE_control
, MoE_AvePP
Examples
data(ais)
res <- MoE_clust(ais[,3:7], G=3, gating= ~ BMI + sex,
modelNames="EEE", network.data=ais)
# Calculate the normalised entropy
MoE_entropy(res)
# Calculate the normalised entropy per cluster
MoE_entropy(res, group=TRUE)
E-step for MoEClust Models
Description
Softmax function to compute the responsibility matrix z and the log-likelihood for MoEClust models, with the aid of MoE_dens
.
Usage
MoE_estep(data,
mus,
sigs,
log.tau = 0L,
Vinv = NULL,
Dens = NULL)
Arguments
data |
If there are no expert network covariates, |
mus |
The mean for each of G components. If there is more than one component, this is a matrix whose k-th column is the mean of the k-th component of the mixture model. For the univariate models, this is a G-vector of means. In the presence of expert network covariates, all values should be equal to |
sigs |
The |
log.tau |
If covariates enter the gating network, an n times G matrix of mixing proportions, otherwise a G-vector of mixing proportions for the components of the mixture. Must be on the log-scale in both cases. The default of |
Vinv |
An estimate of the reciprocal hypervolume of the data region. See the function |
Dens |
(Optional) A numeric matrix whose |
Value
A list containing two elements:
z |
A matrix with |
loglik |
The estimated log-likelihood, computed efficiently via |
Note
This softmax function is intended for joint use with MoE_dens
, using the log-densities. Caution is advised using this function without explicitly naming the arguments. Models with a noise component are facilitated here too.
The E-step can be replaced by a C-step, see MoE_cstep
and the algo
argument to MoE_control
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
MoE_dens
, MoE_clust
, MoE_cstep
, MoE_control
, logsumexp
, mclustVariance
, softmax
Examples
data(ais)
hema <- ais[,3:7]
model <- MoE_clust(hema, G=3, gating= ~ BMI + sex, modelNames="EEE", network.data=ais)
Dens <- MoE_dens(data=hema, mus=model$parameters$mean,
sigs=model$parameters$variance, log.tau=log(model$parameters$pro))
# Construct the z matrix and compute the log-likelihood
Estep <- MoE_estep(Dens=Dens)
(ll <- Estep$loglik)
# Check that the z matrix & classification are the same as those from the model
identical(max.col(Estep$z), as.integer(unname(model$classification))) #TRUE
identical(Estep$z, model$z) #TRUE
# Call MoE_estep directly
Estep2 <- MoE_estep(data=hema, sigs=model$parameters$variance,
mus=model$parameters$mean, log.tau=log(model$parameters$pro))
identical(Estep2$loglik, ll) #TRUE
# The same can be done for models with expert covariates &/or a noise component
# Note for models with expert covariates that the mean has to be supplied as 0,
# and the data has to be supplied as "resid.data"
m2 <- MoE_clust(hema, G=2, expert= ~ sex, modelNames="EVE", network.data=ais, tau0=0.1)
Estep3 <- MoE_estep(data=m2$resid.data, sigs=m2$parameters$variance, mus=0,
log.tau=log(m2$parameters$pro), Vinv=m2$parameters$Vinv)
Generalised Pairs Plots for MoEClust Mixture Models
Description
Produces a matrix of plots showing pairwise relationships between continuous response variables and continuous/categorical/logical/ordinal associated covariates, as well as the clustering achieved, according to fitted MoEClust mixture models.
Usage
MoE_gpairs(res,
response.type = c("points", "uncertainty", "density"),
subset = list(...),
scatter.type = c("lm", "points"),
conditional = c("stripplot", "boxplot"),
addEllipses = c("outer", "yes", "no", "inner", "both"),
expert.covar = TRUE,
border.col = c("purple", "black", "brown", "brown", "navy"),
bg.col = c("cornsilk", "white", "palegoldenrod", "palegoldenrod", "cornsilk"),
outer.margins = list(bottom = grid::unit(2, "lines"),
left = grid::unit(2, "lines"),
top = grid::unit(2, "lines"),
right = grid::unit(2, "lines")),
outer.labels = NULL,
outer.rot = c(0, 90),
gap = 0.05,
buffer = 0.025,
uncert.cov = FALSE,
scatter.pars = list(...),
density.pars = list(...),
diag.pars = list(...),
stripplot.pars = list(...),
boxplot.pars = list(...),
barcode.pars = list(...),
mosaic.pars = list(...),
axis.pars = list(...),
...)
Arguments
res |
An object of class |
response.type |
The type of plot desired for the scatterplots comparing continuous response variables. Defaults to Points can also be sized according to their associated clustering uncertainty with the option Alternatively, the bivariate parametric |
subset |
A list giving named arguments for producing only a subset of panels:
The results of the subsetting must ensure that at least one panel of some sort can be plotted. The arguments |
scatter.type |
A vector of length 2 (or 1) giving the plot type for the upper and lower triangular portions of the plot, respectively, pertaining to the associated covariates. Defaults to |
conditional |
A vector of length 2 (or 1) giving the plot type for the upper and lower triangular portions of the plot, respectively, for plots involving a mix of categorical and continuous variables. Defaults to All |
addEllipses |
Controls whether to add MVN ellipses with axes corresponding to the within-cluster covariances for the response data. The options Ellipses are centered on the posterior mean of the fitted values when there are expert network covariates, otherwise on the posterior mean of the response variables. In the presence of expert network covariates, the component-specific covariance matrices are also (by default, via the argument |
expert.covar |
Logical (defaults to |
border.col |
A vector of length 5 (or 1) containing border colours for plots against the MAP classification, response vs. response, covariate vs. response, response vs. covariate, and covariate vs. covariate panels, respectively. Defaults to |
bg.col |
A vector of length 5 (or 1) containing background colours for plots against the MAP classification, response vs. response, covariate vs. response, response vs. covariate, and covariate vs. covariate panels, respectively. Defaults to |
outer.margins |
A list of length 4 with units as components named |
outer.labels |
The default is typically Note that axis labels always correspond to the range of the depicted variable, and thus should not be interpreted as indicating counts or densities for the diagonal panels when |
outer.rot |
A 2-vector ( |
gap |
The gap between the tiles; defaulting to |
buffer |
The fraction by which to expand the range of quantitative variables to provide plots that will not truncate plotting symbols. Defaults to |
uncert.cov |
A logical indicating whether the expansion factor for points on plots involving covariates should also be modified when |
scatter.pars |
A list supplying select parameters for the continuous vs. continuous scatterplots.
list(scat.pch=res$classification, uncert.pch=19, scat.col=res$classification, scat.size=unit(0.25, "char"), eci.col=1:res$G, noise.size=unit(0.2, "char")), where Note also that |
density.pars |
A list supplying select parameters for visualising the bivariate parametric density contours, only when
list(grid.size=c(100, 100), dcol="grey50", dens.points=FALSE, nlevels=11, show.labels=!dens.points, label.style="mixed"), where |
diag.pars |
A list supplying select parameters for panels along the diagonal.
list(diag.fontsize=9, diagonal=TRUE, hist.color=hist.color, show.hist=TRUE, show.counts=TRUE, show.dens=FALSE, dens.grid=100), where The argument When |
stripplot.pars |
A list supplying select parameters for continuous vs. categorical panels when one or both of the entries of
list(strip.pch=res$classification, strip.size=unit(0.5, "char"), strip.col=res$classification, jitter=TRUE, size.noise=unit(0.4, "char")), where |
boxplot.pars |
A list supplying select parameters for continuous vs. categorical panels when one or both of the entries of
list(box.pch="|", box.col="black", varwidth=FALSE, notch=FALSE, notch.frac=0.5, box.fill=1:res$G). All of the above are relevant for |
barcode.pars |
A list supplying select parameters for continuous vs. categorical panels when one or both of the entries of
list(bar.col=res$G:1, nint=0, ptsize=scatter.pars$scat.size, ptpch=scatter.pars$scat.pch, bcspace=NULL, use.points=FALSE), where |
mosaic.pars |
A list supplying select parameters for categorical vs. categorical panels (if any).
list(shade=NULL, gp_labels=grid::gpar(fontsize=9), gp_args=list(), gp=list(), mfill=TRUE, mcol=1:res$G). The current default arguments and values thereof are passed through to |
axis.pars |
A list supplying select parameters for controlling the axes.
list(n.ticks=5, axis.fontsize=9). The argument |
... |
Catches unused arguments. Alternatively, named arguments can be passed directly here to any/all of |
Value
A generalised pairs plot showing all pairwise relationships between clustered response variables and associated gating &/or expert network continuous &/or categorical variables, coloured according to the MAP classification, with the marginal distributions of each variable along the diagonal.
Note
plot.MoEClust
is a wrapper to MoE_gpairs
which accepts the default arguments, and also produces other types of plots. Caution is advised producing generalised pairs plots when the dimension of the data is large.
Note that all colour-related defaults in scatter.pars
, stripplot.pars
, barcode.pars
, and mosaic.pars
above assume a specific colour-palette (see mclust.options("classPlotColors")
). Thus, for instance, specifying scatter.pars$scat.col=res$classification
will produce different results compared to leaving this argument unspecified. This is especially true for models with a noise component, for which the default is handled quite differently (for one thing, res$G
is the number of non-noise components). Similarly, all pch
-related defaults in scatter.pars
and stripplot.pars
above assume a specific set of plotting symbols also (see mclust.options("classPlotSymbols")
). Generally, all colour and symbol related arguments are strongly recommended to be left at their default values, unless being supplied as a single character string, e.g. "black"
for colours. To help in this regard, colour-related arguments sensibly inherent their defaults from scatter.pars$scat.col
if that is supplied and the argument in question is not.
Warning
For MoEClust
models with more than one expert network covariate, fitted lines produced in continuous covariate vs. continuous response scatterplots via scatter.type="lm"
or scatter.type="ci"
will NOT correspond to the coefficients in the expert network (res$expert
).
Caution is advised when producing "barcode"
plots for the conditional
panels. In some cases, resizing the graphics device after the production of the plot will result in distortion because of the way the rotation of non-horizontal barcodes is performed. Thus, when any(conditional == "barcode")
, it is advisable to ensure the dimensions of the overall plot are square. Furthermore, such plots may not display correctly anyway in RStudio's “Plots” pane and so a different graphics device may need to be used (but not subsequently resized).
Caution is also advised when producing generalised pairs plots when the dimension of the data is large.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <doi:10.1007/s11634-019-00373-8>.
Emerson, J. W., Green, W. A., Schloerke, B., Crowley, J., Cook, D., Hofmann, H. and Wickham, H. (2013). The generalized pairs plot. Journal of Computational and Graphical Statistics, 22(1): 79-91.
See Also
MoE_clust
, MoE_stepwise
, plot.MoEClust
, MoE_Uncertainty
, expert_covar
, panel.stripplot
, panel.bwplot
, panel.violin
, strucplot
, mclust.options
Examples
data(ais)
res <- MoE_clust(ais[,3:7], G=2, gating= ~ BMI, expert= ~ sex,
network.data=ais, modelNames="EVE")
MoE_gpairs(res)
# Produce the same plot, but with a violin plot in the lower triangle.
# Colour the outline of the mosaic tiles rather than the interior using mfill.
# Size points in the response vs. response panels by their clustering uncertainty.
MoE_gpairs(res, conditional=c("stripplot", "violin"),
mfill=FALSE, response.type="uncertainty")
# Instead show the bivariate density contours of the response variables (without labels).
# (Plotting may be slow when response.type="density" for models with expert covariates.)
# Use different colours for histograms of covariates in the gating/expert/both networks.
# Also use different colours for response vs. covariate & covariate vs. response panels.
MoE_gpairs(res, response.type="density", show.labels=FALSE, dens.points=TRUE,
hist.color=c("black", "cyan", "hotpink", "chartreuse"),
bg.col=c("whitesmoke", "white", "mintcream", "mintcream", "floralwhite"))
# Examine effect of expert.covar & diag.grid in conjunction with show.dens & show.hist
MoE_gpairs(res, show.dens=TRUE, expert.covar=FALSE, show.hist=FALSE, diag.grid=20)
MoE_gpairs(res, show.dens=TRUE, expert.covar=TRUE, show.hist=TRUE, diag.grid=200)
# Explore various options to subset and rearrange the panels
MoE_gpairs(res, data.ind=5:1, cov.ind=0,
show.map=FALSE, show.hist=FALSE,
submat="upper", diagonal=FALSE)
# Produce a generalised pairs plot for a model with a noise component.
# Reorder the covariates and omit the variables "Hc" and "Hg".
# Use barcode plots for the categorical/continuous pairs.
# Magnify the size of scatter points assigned to the noise component.
resN <- MoE_clust(ais[,3:7], G=2, gating= ~ SSF + Ht, expert= ~ sex,
network.data=ais, modelNames="EEE", tau0=0.1, noise.gate=FALSE)
# Note that non-horizontal barcode panels may not display correctly in RStudio's "Plots" pane
# it may be necessary to first open a new device:
# dev.new()
MoE_gpairs(resN, data.ind=c(1,2,5), cov.ind=c(3,1,2), use.points=TRUE,
conditional="barcode", noise.size=grid::unit(0.5, "char"))
# Plots can be modified to show only a single (diagonal) panel of interest
MoE_gpairs(resN, data.ind=0, cov.ind=0)
MoE_gpairs(resN, data.ind=0, cov.ind="sex", show.map=FALSE)
MoE_gpairs(resN, data.ind="RCC", cov.ind=0, show.map=FALSE, show.dens=TRUE)
Mahalanobis Distance Outlier Detection for Multivariate Response
Description
Computes the Mahalanobis distance between the response variable(s) and the fitted values of linear regression models with multivariate or univariate responses.
Usage
MoE_mahala(fit,
resids,
squared = FALSE,
identity = NULL)
Arguments
fit |
A fitted |
resids |
The residuals. Can be residuals for observations included in the model, or residuals arising from predictions on unseen data. Must be coercible to a matrix with the number of columns being the number of response variables. Missing values are not allowed. |
squared |
A logical. By default ( |
identity |
A logical indicating whether the identity matrix is used in place of the precision matrix in the Mahalanobis distance calculation. Defaults to |
Value
A vector giving the Mahalanobis distance (or squared Mahalanobis distance) between response(s) and fitted values for each observation.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <doi:10.1007/s11634-019-00373-8>.
Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences, India, 2(1): 49-55.
Examples
data(ais)
hema <- as.matrix(ais[,3:7])
mod <- lm(hema ~ sex + BMI, data=ais)
res <- hema - predict(mod)
MoE_mahala(mod, res, squared=TRUE)
data(CO2data)
CO2 <- CO2data$CO2
GNP <- CO2data$GNP
mod2 <- lm(CO2 ~ GNP, data=CO2data)
pred <- predict(mod2)
res2 <- CO2 - pred
maha <- MoE_mahala(mod2, res2)
# Highlight outlying observations
plot(GNP, CO2, type="n", ylab=expression('CO'[2]))
lines(GNP, pred, col="red")
points(GNP, CO2, cex=maha, lwd=2)
text(GNP, CO2, col="blue",
labels=replace(as.character(CO2data$country), maha < 1, ""))
# Replicate initialisation strategy using 2 randomly chosen components
# Repeat the random initialisation if necessary
# (until 'crit' at convergence is minimised)
G <- 3L
z <- sample(seq_len(G), nrow(CO2data), replace=TRUE)
old <- Inf
crit <- .Machine$double.xmax
while(crit < old) {
Sys.sleep(1)
old <- crit
maha <- NULL
plot(GNP, CO2, type="n", ylab=expression('CO'[2]))
for(g in seq_len(G)) {
ind <- which(z == g)
mod <- lm(CO2 ~ GNP, data=CO2data, sub=ind)
pred <- predict(mod, newdata=CO2data[,"CO2", drop=FALSE])
maha <- cbind(maha, MoE_mahala(mod, CO2 - pred))
lines(GNP, pred, col=g + 1L)
}
min.M <- rowMins(maha)
crit <- sum(min.M)
z <- max.col(maha == min.M)
points(GNP, CO2, cex=min.M, lwd=2, col=z + 1L)
text(GNP, CO2, col=z + 1L,
labels=replace(as.character(CO2data$country), which(min.M <= 1), ""))
}
crit
Show the NEWS file
Description
Show the NEWS
file of the MoEClust package.
Usage
MoE_news()
Value
The MoEClust NEWS
file, provided the session is interactive.
Examples
MoE_news()
Model Selection Criteria Plot for MoEClust Mixture Models
Description
Plots the BIC, ICL, AIC, or log-likelihood values of a fitted MoEClust
object.
Usage
MoE_plotCrit(res,
criterion = c("bic", "icl", "aic", "loglik", "df", "iters"),
...)
Arguments
res |
An object of class |
criterion |
The criterion to be plotted. Defaults to |
... |
Catches other arguments, or additional arguments to be passed to |
Value
A plot of the values of the chosen criterion
. The values themselves can also be returned invisibly.
Note
plot.MoEClust
is a wrapper to MoE_plotCrit
which accepts the default arguments, and also produces other types of plots.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
MoE_clust
, MoE_control
, plot.MoEClust
, plot.mclustBIC
Examples
# data(ais)
# res <- MoE_clust(ais[,3:7], expert= ~ sex, network.data=ais)
# (crit <- MoE_plotCrit(res))
# Plots can also be produced directly
# plot(res$ICL)
Plot MoEClust Gating Network
Description
Plots the gating network for fitted MoEClust models, i.e. the observation index against the mixing proportions for that observation, coloured by cluster.
Usage
MoE_plotGate(res,
x.axis = NULL,
type = "b",
pch = 1,
xlab = "Observation",
ylab = expression(widehat(tau)[g]),
ylim = c(0, 1),
col = NULL,
...)
Arguments
res |
An object of class |
x.axis |
Optional argument for the x-axis against which the mixing proportions are plotted. Defaults to |
type , pch , xlab , ylab , ylim , col |
These graphical parameters retain their definitions from |
... |
Catches unused arguments, or additional arguments to be passed to |
Value
A plot of the gating network of the fitted MoEClust model. The parameters of the gating network can also be returned invisibly.
Note
plot.MoEClust
is a wrapper to MoE_plotGate
which accepts the default arguments, and also produces other types of plots.
By default, the noise component (if any) will be coloured "darkgrey"
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
MoE_clust
, plot.MoEClust
, matplot
Examples
data(ais)
res <- MoE_clust(ais[,3:7], gating= ~ BMI, G=3, modelNames="EEE",
network.data=ais, noise.gate=FALSE, tau0=0.1)
# Plot against the observation index and examine the gating network coefficients
(gate <- MoE_plotGate(res))
# Plot against BMI
MoE_plotGate(res, x.axis=ais$BMI, xlab="BMI")
# Plot against a categorical covariate
res2 <- MoE_clust(ais[,3:7], gating= ~ sex, G=3, modelNames="EVE", network.data=ais)
MoE_plotGate(res2, x.axis=ais$sex, xlab="sex")
Plot the Log-Likelihood of a MoEClust Mixture Model
Description
Plots the log-likelihood at every iteration of the EM/CEM algorithm used to fit a MoEClust mixture model.
Usage
MoE_plotLogLik(res,
type = "l",
xlab = "Iteration",
ylab = "Log-Likelihood",
xaxt = "n",
...)
Arguments
res |
An object of class |
type , xlab , ylab , xaxt |
These graphical parameters retain their usual definitions from |
... |
Catches unused arguments, or additional arguments to be passed to |
Value
A plot of the log-likelihood versus the number EM iterations. A list with the vector of log-likelihood values and the final value at convergence can also be returned invisibly.
Note
plot.MoEClust
is a wrapper to MoE_plotLogLik
which accepts the default arguments, and also produces other types of plots.
res$LOGLIK
can also be plotted, to compare maximal log-likelihood values for all fitted models.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
Examples
data(ais)
res <- MoE_clust(ais[,3:7], gating= ~ BMI, expert= ~ sex, tau0=0.1,
G=2, modelNames="EVE", network.data=ais)
(ll <- MoE_plotLogLik(res))
Stepwise model/variable selection for MoEClust models
Description
Conducts a greedy forward stepwise search to identify the optimal MoEClust
model according to some criterion
. Components and/or gating
covariates and/or expert
covariates are added to new MoE_clust
fits at each step, while each step is evaluated for all valid modelNames
.
Usage
MoE_stepwise(data,
network.data = NULL,
gating = NULL,
expert = NULL,
modelNames = NULL,
fullMoE = FALSE,
noise = FALSE,
initialModel = NULL,
initialG = NULL,
stepG = TRUE,
criterion = c("bic", "icl", "aic"),
equalPro = c("all", "both", "yes", "no"),
noise.gate = c("all", "both", "yes", "no"),
verbose = interactive(),
...)
Arguments
data |
A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations and columns correspond to variables. |
network.data |
An optional matrix or data frame in which to look for the covariates specified in the |
gating |
A vector giving the names of columns in If |
expert |
A vector giving the names of columns in If |
modelNames |
A character string of valid model names, to be used to restrict the size of the search space, if desired. By default, all valid model types are explored. Rather than considering the changing of the model type as an additional step, every step is evaluated over all entries in Note that if |
fullMoE |
A logical which, when Furthermore, when In addition, caution is advised using this argument in conjunction with |
noise |
A logical indicating whether to assume all models contain an additional noise component ( |
initialModel |
An object of class If However, while |
initialG |
A single (positive) integer giving the number of mixture components (clusters) to initialise the stepwise search algorithm with. This is a simpler alternative to the |
stepG |
A logical indicating whether the algorithm should consider incrementing the number of components at each step. Defaults to |
criterion |
The model selection criterion used to determine the optimal action at each step. Defaults to |
equalPro |
A character string indicating whether models with equal mixing proportions should be considered. The default ( Considering |
noise.gate |
A character string indicating whether models where the gating network for the noise component depends on covariates are considered. The default ( Considering |
verbose |
Logical indicating whether to print messages pertaining to progress to the screen during fitting. By default is |
... |
Additional arguments to |
Details
The arguments modelNames
, equalPro
, and noise.gate
are provided for computational convenience. They can be used to reduce the number of models under consideration at each stage.
The same is true of the arguments gating
and expert
, which can each separately (or jointly, if fullMoE
is TRUE
) be made to consider all variables in network.data
, or a subset, or none at all.
Finally, initialModel
or initialG
can be used to kick-start the search algorithm by incorporating prior information in a more direct way; in the latter case, only in the form of the number of components; in the former case, a full model with a given number of components, certain included gating and expert network covariates, and a certain model type can give the model an even more informed head start. In either case, the stepG
argument can be used to fix the number of components and only search over different configurations of covariates.
Without any prior information, it is best to accept the defaults at the expense of a longer run-time.
Value
An object of class "MoECompare"
containing information on all visited models and the optimal model (accessible via x$optimal
).
Note
It is advised to run this function once with noise=FALSE
and once with noise=TRUE
and then choose the optimal model across both sets of results.
At present, only additions (of components and covariates) are considered. In future updates, it may be possible to allow both additions and removals.
The function will attempt to remove duplicate variables found in both data
and network.data
; in particular, they will be removed from network.data
. Users are however advised to carefully specify data
and network.data
such that there are no duplicates, especially if the desired variable(s) should belong to network.data
.
Finally, if the user intends to search for the best model according to the "icl"
criterion
, then specifying either initialModel
or initialG
is advisable. This is because the algorithm otherwise starts with a single component and thus there is no entropy term, meaning the stepwise search can quickly and easily get stuck at G=1
. See the examples below.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <doi:10.1007/s11634-019-00373-8>.
See Also
MoE_clust
, MoE_compare
, MoE_control
Examples
# data(CO2data)
# Search over all models where the single covariate can enter either network
# (mod1 <- MoE_stepwise(CO2data$CO2, CO2data[,"GNP", drop=FALSE]))
#
# data(ais)
# Only look for EVE & EEE models with at most one expert network covariate
# Do not consider any gating covariates and only consider models with equal mixing proportions
# (mod2 <- MoE_stepwise(ais[,3:7], ais, gating=NA, expert="sex",
# equalPro="yes", modelNames=c("EVE", "EEE")))
#
# Look for models with noise & only those where the noise component's mixing proportion is constant
# Speed up the search with an initialModel, fix G, and restrict the covariates & model type
# init <- MoE_clust(ais[,3:7], G=2, modelNames="EEE",
# expert= ~ sex, network.data=ais, tau0=0.1)
# (mod3 <- MoE_stepwise(ais[,3:7], ais, noise=TRUE, expert="sex",
# gating=c("SSF", "Ht"), noise.gate="no",
# initialModel=init, stepG=FALSE, modelNames="EEE"))
#
# Compare both sets of results (with & without a noise component) for the ais data
# (comp1 <- MoE_compare(mod2, mod3, optimal.only=TRUE))
# comp1$optimal
#
# Target a model for the AIS data which is optimal in terms of ICL, without any restrictions
# mod4 <- MoE_stepwise(ais[,3:7], ais, criterion="icl")
#
# This gets stuck at a G=1 model, so specify an initial G value as a head start
# mod5 <- MoE_stepwise(ais[,3:7], ais, criterion="icl", initialG=2)
#
# Check that specifying an initial G value enables a better model to be found
# (comp2 <- MoE_compare(mod4, mod5, optimal.only=TRUE, criterion="icl"))
# Finally, restrict the search to full MoE models only
# Notice that the candidate covariates are the union of gating and expert
# Notice also that the algorithm initially traverses models with only
# expert covariates when the inclusion of gating covariates is infeasible
# mod6 <- MoE_stepwise(ais[,3:7], ais, fullMoE=TRUE, gating="BMI", expert="Bfat")
Australian Institute of Sport data
Description
Data on 102 male and 100 female athletes collected at the Australian Institute of Sport, courtesy of Richard Telford and Ross Cunningham.
Usage
data(ais)
Format
A data frame with 202 observations on the following 13 variables:
sex
categorical, levels =
female, male
sport
categorical, levels =
B_Ball, Field, Gym, Netball, Row, Swim, T_400m, Tennis, T_Sprnt, W_Polo
RCC
red cell count (numeric)
WCC
white cell count (numeric)
Hc
Hematocrit (numeric)
Hg
Hemoglobin (numeric)
Fe
plasma ferritin concentration (numeric)
BMI
body mass index:
Wt/(Ht)^2
(numeric)SSF
sum of skin folds (numeric)
Bfat
body fat percentage (numeric)
LBM
lean body mass (numeric)
Ht
height, cm (numeric)
Wt
weight, kg (numeric)
Details
The data have been made publicly available in connection with the book by Cook and Weisberg (1994).
References
Cook, R. D. and Weisberg, S. (1994), An Introduction to Regression Graphics. Volume 405 of Wiley Series in Probability and Statistics, New York, NY, USA: John Wiley & Sons.
Examples
data(ais, package="MoEClust")
pairs(ais[,c(3:7)], col=as.numeric(ais$sex), main = "AIS data")
apply(ais[,c(3:7)], 2, summary)
Aitken Acceleration
Description
Calculates the Aitken acceleration estimate of the final converged maximised log-likelihood under the EM/CEM framework.
Usage
aitken(loglik)
Arguments
loglik |
A vector of three consecutive log-likelihood values. These three values should be in ascending order, though this is not checked. |
Details
The final converged maximised log-likelihood can be used to determine convergence of the EM/CEM algorithm within MoE_clust
, i.e. by checking whether the absolute difference between the previous log-likelihood estimate and the final converged maximised log-likelihood estimate is less than some tolerance.
Value
A list with the following named components:
ll |
The most current estimate of the log-likelihood, i.e. |
linf |
The most current estimate of the final converged maximised log-likelihood. |
a |
The Aitken acceleration value where typically |
ldiff |
The difference between |
When the "aitken"
method is employed within MoE_clust
(via MoE_control
), ll
at convergence gives the log-likelihood achieved by the estimated parameters, while linf
at convergence estimates the log-likelihood that would be achieved after an infinite number of EM/CEM iterations.
Note
Within MoE_clust
, as specified by the stopping
argument of MoE_control
, "aitken"
is the default method used to assess convergence. The other option monitors the "relative"
change in log-likelihood against some tolerance. See MoE_control
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Boehning, D., Dietz, E., Schaub, R., Schlattmann, P. and Lindsay, B. G. (1994). The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics, 46(2): 373-388.
McNicholas, P. D., Murphy, T. B., McDaid, A. F. and Frost, D. (2010). Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Computational Statistics & Data Analysis, 54(3): 711-723.
See Also
Examples
(a1 <- aitken(-c(449.61534, 442.84221, 436.58999)))
a1$ldiff < 1e-05 # FALSE
(a2 <- aitken(-c(442.84221, 436.58999, 436.58998)))
a2$ldiff < 1e-05 # FALSE
(a3 <- aitken(-c(436.58999, 436.58998, 436.58998)))
a3$ldiff < 1e-05 # TRUE
Convert MoEClust objects to the Mclust class
Description
Converts an object of class "MoEClust"
generated by MoE_clust
and converts it to an object of class "Mclust"
as generated by fitting Mclust
, to facilitate use of plotting and other functions for the "Mclust"
class within the mclust package. Some caution is advised when converting models with gating &/or expert covariates (see Note below) and users are always encouraged to use the dedicated plot.MoEClust
function for objects of the "MoEClust"
class instead.
Usage
## S3 method for class 'MoEClust'
as.Mclust(x,
expert.covar = TRUE,
signif = 0L,
...)
Arguments
x |
An object of class |
expert.covar |
Logical (defaults to |
signif |
Significance level for outlier removal. Must be a single number in the interval [0, 1). Corresponds to the percentage of data to be considered extreme and therefore removed (half of |
... |
Further arguments to be passed to other methods. |
Details
Mixing proportions are averaged over observations in components in the presence of gating network covariates during the coercion. For models with expert network covariates, the means are given by the posterior means of the fitted values of the expert network.
In the presence of expert network covariates, the component-specific covariance matrices are (by default, via the argument expert.covar
) modified for plotting purposes via the function expert_covar
, in order to account for the extra variability of the means, usually resulting in bigger shapes & sizes for the MVN ellipses.
The signif
argument is intended only to aid visualisation via plot.Mclust
, as plots therein can be sensitive to outliers, particularly with regard to axis limits. However, users are always encouraged to use the dedicated plot.MoEClust
function for objects of the "MoEClust"
class instead (see Note below).
Value
An object of class "Mclust"
. See methods(class="Mclust")
for a (non-exhaustive) list of functions which can be applied to this class.
Note
Plots may be quite misleading in the presence of gating &/or (especially) expert network covariates when the what
argument is "density"
within plot.Mclust
; users are strongly encouraged to use MoE_gpairs
with response.type="density"
&/or show.dens=TRUE
instead.
Predictions (via predict.Mclust
) will also be misleading in the presence of covariates of any kind when newdata
is supplied; thus, users are strongly encouraged to use predict.MoEClust
instead.
The functions clustCombi
and clustCombiOptim
can be safely used (provided as.Mclust(x)
is supplied as the object
argument to clustCombi
), as they only rely on x$z
and x$G
only. See the examples below.
Users may expect MoEClust models with no covariates of any kind to be identical to models fitted via mclust, but this is not necessarily true: see the MoE_control
argument asMclust
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458): 611-631.
Scrucca L., Fop M., Murphy T. B. and Raftery A. E. (2016). mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, 8(1): 289-317.
See Also
Mclust
, plot.Mclust
, MoE_clust
, plot.MoEClust
, predict.MoEClust
, expert_covar
, MoE_control
, clustCombi
, clustCombiOptim
Examples
library(mclust)
# Fit a gating network mixture of experts model to the ais data
data(ais)
mod <- MoE_clust(ais[,3:7], G=3, gating= ~ BMI + sex, modelNames="EEE", network.data=ais)
# Convert to the "Mclust" class and examine the classification
mod2 <- as.Mclust(mod)
plot(mod2, what="classification")
# Examine the uncertainty
plot(mod2, what="uncertainty")
# Return the optimal number of clusters according to entropy
combi <- mclust::clustCombi(object=mod2)
optim <- mclust::clustCombiOptim(object=combi)
table(mod2$classification, ais$sex)
table(optim$cluster.combi, ais$sex)
# Compare plot.MoEClust and plot.Mclust for univariate mixtures
data(CO2data)
res <- MoE_clust(CO2data$CO2, G=2, expert = ~ GNP, modelNames="V", network.data=CO2data)
plot(as.Mclust(res), what="classification")
plot(as.Mclust(res), what="density")
plot(as.Mclust(res), what="uncertainty")
# Proper version of what="density" plot:
MoE_gpairs(res, show.map=FALSE, cov.ind=0, show.dens=TRUE)
# Equivalent what="uncertainty" plot:
MoE_Uncertainty(res)
Drop constant variables from a formula
Description
Drops constant variables from the RHS of a formula taking the data set (dat
), the formula (formula
), and an optional subset vector (sub
) as arguments.
Usage
drop_constants(dat,
formula,
sub = NULL)
Arguments
dat |
A |
formula |
An object of class |
sub |
An optional vector specifying a subset of observations to be used in the fitting process. |
Value
The updated formula with constant variables removed.
Note
Formulas with and without intercepts are accommodated.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
Examples
data(ais)
hema <- as.matrix(ais[,3:7])
sex <- ais$sex
BMI <- ais$BMI
# Set up a no-intercept regression formula with constant column 'sex'
form1 <- as.formula(hema ~ sex + BMI + I(BMI^2) - 1)
sub <- ais$sex == "male"
# Try fitting a linear model
mod1 <- try(lm(form1, data=ais, subset=sub), silent=TRUE)
inherits(mod1, "try-error") # TRUE
# Remove redundant variables from formula & try again
form2 <- drop_constants(ais, form1, sub)
mod2 <- try(lm(form2, data=ais, subset=sub), silent=TRUE)
inherits(mod2, "try-error") # FALSE
Drop unused factor levels to predict from unseen data
Description
Drops unseen factor levels in newdata
for which predictions are required from a lm
or multinom
model fit
.
Usage
drop_levels(fit,
newdata)
Arguments
fit |
|
newdata |
A |
Value
A data.frame
like newdata
with unseen factor levels replaced by NA
.
Note
This function is so far untested for models other than lm
or multinom
, though it may still work for other classes.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
Examples
data(ais)
hema <- as.matrix(ais[,3:7])
BMI <- ais$BMI
sport <- ais$sport
sub <- ais$sport != "Row"
# Fit a linear model
mod <- lm(hema ~ BMI + sport, data=ais, subset=sub)
# Make predictions
pred1 <- try(predict(mod, newdata=ais), silent=TRUE)
inherits(pred1, "try-error") #TRUE
# Remove unused levels and try again
pred2 <- try(predict(mod, newdata=drop_levels(mod, ais)), silent=TRUE)
inherits(pred2, "try-error") #FALSE
anyNA(pred2) #TRUE
Account for extra variability in covariance matrices with expert covariates
Description
In the presence of expert network covariates, this helper function modifies the component-specific covariance matrices of a "MoEClust"
object, in order to account for the extra variability due to the component means, usually resulting in bigger shapes & sizes for the MVN ellipses in MoE_gpairs
plots. The function also works for univariate response data.
Usage
expert_covar(x,
weighted = TRUE,
...)
Arguments
x |
An object of class |
weighted |
A logical indicating whether the estimated cluster membership probabilities should be used to provide a weighted estimate of the variability due to the component means. Defaults to |
... |
Catches unused arguments. |
Details
This function is used internally by MoE_gpairs
, plot.MoEClust(x, what="gpairs")
, and as.Mclust
, for visualisation purposes.
Value
The variance
component only from the parameters
list from the output of a call to MoE_clust
, modified accordingly.
Note
The modelName
of the resulting variance
object may not correspond to the model name of the "MoEClust"
object, in particular scale
, shape
, &/or orientation
may no longer be constrained across clusters, and cholsigma
, if it was in the input, will be discarded from the output. Usually, the modelName
of the transformed variance
object will be "VVV"
for multivariate data and "V"
for univariate data, but not always. Furthermore, the output will drop certain row and column names from the result.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <doi:10.1007/s11634-019-00373-8>.
See Also
MoE_clust
, MoE_gpairs
, plot.MoEClust
, as.Mclust
Examples
data(ais)
res <- MoE_clust(ais[,3:7], G=2, gating= ~ 1, expert= ~ sex,
network.data=ais, modelNames="EEE", equalPro=TRUE)
# Extract the variance object
res$parameters$variance
# Modify the variance object
expert_covar(res)
Force diagonal elements of a triangular matrix to be positive
Description
This function ensures that the triangular matrix in a QR (or other) decomposition has positive values along its diagonal.
Usage
force_posiDiag(x)
Arguments
x |
A matrix, which must be either upper-triangular or lower-triangular. |
Value
An upper or lower triangular matrix with positive diagonal entries such that the matrix is still a valid decomposition of the matrix the input x
is a decomposition of.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
Examples
data(ais)
res <- MoE_clust(ais[,3:7], G=3, modelNames="EEE")
sig <- res$parameters$variance
a <- force_posiDiag(sig$cholSigma)
b <- chol(sig$Sigma)
all.equal(a, b) #TRUE
all.equal(crossprod(a), sig$Sigma) #TRUE
all.equal(crossprod(b), sig$Sigma) #TRUE
Approximate Hypervolume Estimate
Description
Computes simple approximations to the hypervolume of univariate and multivariate data sets. Also returns the location of the centre of mass.
Usage
noise_vol(data,
method = c("hypvol", "convexhull", "ellipsoidhull"),
reciprocal = FALSE)
Arguments
data |
A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed, and covariates should not be included. If a matrix or data frame, rows correspond to observations and columns correspond to variables. There must be more observations than variables. |
method |
The method used to estimate the hypervolume. The default method uses the function |
reciprocal |
A logical variable indicating whether or not the reciprocal hypervolume is desired rather than the hypervolume itself. The default is to return the hypervolume. |
Value
A list with the following two elements:
vol
A hypervolume estimate (or its inverse).
This can be used as the hypervolume parameter for the noise component when observations are designated as noise in
MoE_clust
.loc
A vector of length
ncol(data)
giving the location of the centre of mass.This can help in predicting the fitted values of models fitted with noise components via
MoE_clust
.
Note
This function is called when adding a noise component to MoEClust
models via the function MoE_control
, specifically using its arguments noise.meth
&/or tau0
. The function internally only uses the response variables, and not the covariates. However, one can bypass the invocation of this function by specifying the noise.vol
argument of MoE_control
directly. This is explicitly necessary for models for high-dimensional data which include a noise component for which this function cannot estimate a (hyper)volume.
Note that supplying the volume manually to MoE_clust
can affect the summary of the means in parameters$mean
and by extension the location of the MVN ellipses in MoE_gpairs
plots for models with both expert network covariates and a noise component. The location cannot be estimated when the volume is supplied manually; in this case, prediction is made on the basis of renormalising the z
matrix after discarding the column corresponding to the noise component. Otherwise, the mean of the noise component is accounted for. The renormalisation approach can be forced by specifying noise.args$discard.noise=TRUE
, even when the mean of the noise component is available.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
hypvol
, convhulln
, ellipsoidhull
Examples
data(ais)
noise_vol(ais[,3:7], reciprocal=TRUE)
noise_vol(ais[,3:7], reciprocal=FALSE, method="convexhull")
Plot MoEClust Results
Description
Plot results for fitted MoE_clust mixture models with gating &/or expert network covariates: generalised pairs plots, model selection criteria, the log-likelihood vs. the EM iterations, and the gating network are all currently visualisable.
Usage
## S3 method for class 'MoEClust'
plot(x,
what = c("gpairs", "gating", "criterion", "loglik", "similarity", "uncertainty"),
...)
Arguments
x |
An object of class |
what |
The type of graph requested:
By default, all of the above graphs are produced. |
... |
Optional arguments to be passed to |
Details
For more flexibility in plotting, use MoE_gpairs
, MoE_plotGate
, MoE_plotCrit
, MoE_plotLogLik
, MoE_Similarity
, and MoE_Uncertainty
directly.
Value
The visualisation according to what
of the results of a fitted MoEClust
model.
Note
Other plotting options are available by first calling as.Mclust
on the fitted object, and then calling plot.Mclust
on the results. However, caution is advised for models with gating &/or expert covariates (see the Note in as.Mclust
).
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <doi:10.1007/s11634-019-00373-8>.
See Also
MoE_clust
, MoE_stepwise
, MoE_gpairs
, MoE_plotGate
, MoE_plotCrit
, MoE_plotLogLik
, MoE_Similarity
, MoE_Uncertainty
, as.Mclust
, plot.Mclust
Examples
data(ais)
res <- MoE_clust(ais[,3:7], gating= ~ BMI, expert= ~ sex,
G=2, modelNames="EVE", network.data=ais)
# Plot the gating network
plot(res, what="gating", x.axis=ais$BMI, xlab="BMI")
# Plot the log-likelihood
plot(res, what="loglik", col="blue")
# Plot the uncertainty profile
plot(res, what="uncertainty", type="profile")
# Produce a generalised pairs plot
plot(res, what="gpairs")
# Produce a heatmap of the similarity matrix
plot(res, what="similarity")
# Modify the gpairs plot by passing arguments to MoE_gpairs()
plot(res, what="gpairs", response.type="density", varwidth=TRUE, show.dens=TRUE,
data.ind=c(5,3,4,1,2), jitter=FALSE, show.counts=FALSE)
Predictions for MoEClust models
Description
Predicts both cluster membership probabilities and fitted response values from a MoEClust
model, using covariates and response data, or covariates only. The predicted MAP classification, mixing proportions, and component means are all also reported in both cases, as well as the predictions of the expert network corresponding to the most probable component.
Usage
## S3 method for class 'MoEClust'
predict(object,
newdata,
resid = FALSE,
discard.noise = FALSE,
MAPresids = FALSE,
use.y = TRUE,
...)
## S3 method for class 'MoEClust'
fitted(object,
...)
## S3 method for class 'MoEClust'
residuals(object,
newdata,
...)
Arguments
object |
An object of class |
newdata |
A list with two named components, each of which must be a
If supplied as a list with elements Alternatively, a single When |
resid |
A logical indicating whether to return the residuals also. Defaults to |
discard.noise |
A logical governing how predictions of the responses are made for models with a noise component (otherwise this argument is irrelevant). By default ( |
MAPresids |
A logical indicating whether residuals are computed against |
use.y |
A logical indicating whether the response variables (if any are supplied either via |
... |
Catches unused arguments (and allows the |
Details
Predictions can also be made for models with a noise component, in which case z
will include the probability of belonging to "Cluster0"
& classification
will include labels with the value 0
for observations classified as noise (if any). The argument discard.noise
governs how the responses are predicted in the presence of a noise component (see noise_vol
for more details).
Note that the argument discard.noise
is invoked for any models with a noise component, while the similar MoE_control
argument noise.args$discard.noise
is only invoked for models with both a noise component and expert network covariates.
Please be aware that a model considered optimal from a clustering point of view may not necessarily be optimal from a prediction point of view. In particular, full MoE models with covariates in both networks (for which both the cluster membership probabilities and component means are observation-specific) are recommended for out-of-sample prediction when only new covariates are observed (see new.x
and new.y
above, as well as use.y
).
Value
A list with the following named components, regardless of whether newdata$new.x
and newdata$new.y
were used, or newdata$new.x
only.
y |
Aggregated fitted values of the response variables. |
z |
A matrix whose |
classification |
The vector of predicted cluster labels for the |
pro |
The predicted mixing proportions for the |
mean |
The predicted component means for the |
MAPy |
Fitted values of the single expert network to which each observation is most probably assigned. Not returned for models with equal mixing proportions when only |
When residuals
is called, only the residuals (governed by MAPresids
) are returned; when predict
is called with resid=TRUE
, the list above will also contain the element resids
, containing the residuals.
The returned values of pro
and mean
are always the same, regardless of whether newdata$new.x
and newdata$new.y
were used, or newdata$new.x
only.
Finally, fitted
is simply a wrapper to predict.MoEClust(object)$y
without any newdata
, and with the resid
and MAPresids
arguments also ignored.
Note
Note that a dedicated predict
function is also provided for objects of class "MoE_gating"
(typically object$gating
, where object
is of class "MoEClust"
). This function is effectively a shortcut to predict(object, ...)$pro
, which (unlike the predict
method for multinom
on which it is based) accounts for the various ways of treating gating covariates and noise components, although its type
argument defaults to "probs"
rather than "class"
. Notably, its keep.noise
argument behaves differently from the discard.noise
argument here; here, the noise component is only discarded in the computation of the predicted responses. See predict.MoE_gating
for further details.
Similarly, a dedicated predict
function is also provided for objects of class "MoE_expert"
(typically object$expert
, where object
is of class "MoE_expert"
). This function is effectively a wrapper to predict(object, ...)$mean
, albeit it returns a list (by default) rather than a 3-dimensional array and also always preserves the dimensions of newdata
, even for models without expert network covariates. See predict.MoE_expert
for further details.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <doi:10.1007/s11634-019-00373-8>.
See Also
MoE_clust
, MoE_control
, noise_vol
, predict.MoE_gating
, predict.MoE_expert
Examples
data(ais)
# Fit a MoEClust model and predict the same data
res <- MoE_clust(ais[,3:7], G=2, gating= ~ BMI, expert= ~ sex,
modelNames="EVE", network.data=ais)
pred1 <- predict(res)
# Get only the fitted responses
fits <- fitted(res)
all.equal(pred1$y, fits) #TRUE
# Remove some rows of the data for prediction purposes
ind <- sample(1:nrow(ais), 5)
dat <- ais[-ind,]
# Fit another MoEClust model to the retained data
res2 <- MoE_clust(dat[,3:7], G=3, gating= ~ BMI + sex,
modelNames="EEE", network.data=dat)
# Predict held back data using the covariates & response variables
(pred2 <- predict(res2, newdata=ais[ind,]))
# pred2 <- predict(res2, newdata=list(new.y=ais[ind,3:7],
# new.x=ais[ind,c("BMI", "sex")]))
# Get the residuals
residuals(res2, newdata=ais[ind,])
# Predict held back data using only the covariates
(pred3 <- predict(res2, newdata=ais[ind,], use.y=FALSE))
# pred3 <- predict(res2, newdata=list(new.x=ais[ind,c("BMI", "sex")]))
# pred3 <- predict(res2, newdata=ais[ind,c("BMI", "sex")])
Predictions from MoEClust expert networks
Description
Predictions (point estimates) of observation-specific component means from each (non-noise) component's expert network linear regression.
Usage
## S3 method for class 'MoE_expert'
predict(object,
newdata = NULL,
simplify = FALSE,
droplevels = FALSE,
...)
## S3 method for class 'MoE_expert'
fitted(object,
...)
## S3 method for class 'MoE_expert'
residuals(object,
...)
Arguments
object |
An object of class |
newdata |
A matrix or data frame of test examples. If omitted, the fitted values are used. |
simplify |
Logical indicating whether to simplify the output (in the form of a list) to a 3-dimensional array with dimensions given by the number of new observations, the number of variables, and the number of clusters. The first dimension of such an array is of length |
droplevels |
A logical indicating whether unseen factor levels in categorical variables within |
... |
Catches unused arguments or allows the |
Details
This function is effectively just a shortcut to lapply(x$expert, predict.lm, newdata=...)
. It can also be thought of as a wrapper to predict.MoEClust(x, ...)$mean
, although it returns a list (by default) rather than a 3-dimensional array and also always preserves the dimensions of newdata
, even for models without expert network covariates.
Value
For simplify=FALSE
, either a list of vectors or predictions (for univariate data) or a list of matrices of predictions (for multivariate data). These lists are of the same length as number of non-noise components in the fitted model. When simplify=TRUE
, a 3-dimensional array of predictions is returned, with respective dimensions given by the number of observations, variables, and non-noise components.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
predict.MoEClust
, lm
, predict.MoE_gating
, drop_levels
Examples
data(CO2data)
res <- MoE_clust(CO2data$CO2, G=3, equalPro=TRUE, expert= ~ GNP, network.data=CO2data)
predict(res$expert)
# Try with newdata and simplify=TRUE
predict(res$expert, newdata=CO2data[1:5,"GNP", drop=FALSE], simplify=TRUE)
Predictions from MoEClust gating networks
Description
Predicts mixing proportions from MoEClust gating networks. Effectively akin to predicting from a multinomial logistic regression via multinom
, although here the noise component (if any) is properly accounted for. So too are models with no gating covariates at all, or models with the equal mixing proportion constraint. Prior probabilities are returned by default.
Usage
## S3 method for class 'MoE_gating'
predict(object,
newdata = NULL,
type = c("probs", "class"),
keep.noise = TRUE,
droplevels = FALSE,
...)
## S3 method for class 'MoE_gating'
fitted(object,
...)
## S3 method for class 'MoE_gating'
residuals(object,
...)
Arguments
object |
An object of class |
newdata |
A matrix or data frame of test examples. If omitted, the fitted values are used. |
type |
The type of output desired. The default ( |
keep.noise |
A logical indicating whether the output should acknowledge the noise component (if any). Defaults to |
droplevels |
A logical indicating whether unseen factor levels in categorical variables within |
... |
Catches unused arguments or allows the |
Details
This function is effectively a shortcut to predict.MoEClust(x, ...)$pro
, which (unlike the predict
method for multinom
on which predict.MoE_gating
is based) accounts for the various ways of treating gating covariates, equal mixing proportion constraints, and noise components, although its type
argument defaults to "probs"
rather than "class"
.
Value
The return value depends on whether newdata
is supplied or not and whether the model includes gating covariates to begin with. When newdata
is not supplied, the fitted values are returned (as a matrix if the model contained gating covariates, otherwise as a vector as per x$parameters$pro
). If newdata
is supplied, the output is always a matrix with the same number of rows as the newdata
.
Note
Note that the keep.noise
argument does not correspond in any way to the discard.noise
argument to predict.MoEClust
; there, the noise component is respected in the computation of the mixing proportions and only discarded (if at all) in the prediction of the responses.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <doi:10.1007/s11634-019-00373-8>.
See Also
predict.MoEClust
, multinom
, predict.MoE_expert
, drop_levels
Examples
data(ais)
mod <- MoE_clust(ais[,3:7], G=2, modelNames="EEE", gating= ~ SSF + Ht,
expert= ~ sex, network.data=ais, tau0=0.1, noise.gate=FALSE)
(preds <- predict(mod$gating, newdata=ais[1:5,]))
all.equal(preds, predict(mod, newdata=ais[1:5,])$pro) #TRUE
# Note that the predictions are not the same as the multinom predict method
# in this instance, owing to the invocation of noise.gate=FALSE above
mod2 <- mod
class(mod2$gating) <- c("multinom", "nnet")
predict(mod2$gating, newdata=ais[1:5,], type="probs")
# We can make this function behave in the same way by invoking keep.noise=FALSE
predict(mod$gating, keep.noise=FALSE, newdata=ais[1:5,])
# ... although keep.noise=FALSE in predict.MoE_gating does not
# yield the same output as discard.noise=TRUE in predict.MoEClust
predict(mod, discard.noise=TRUE, newdata=ais[1:5,])$pro
Quantile-Based Clustering for Univariate Data
Description
Returns a quantile-based clustering for univariate data.
Usage
quant_clust(x,
G)
Arguments
x |
A vector of numeric data. |
G |
The desired number of clusters. |
Value
The vector of cluster labels.
Examples
data(CO2data)
quant_clust(CO2data$CO2, G=2)