Type: | Package |
Title: | Clustering Data with Non-Ignorable Missingness using Semi-Parametric Mixture Models |
Version: | 1.1.0 |
Description: | Clustering of data under a non-ignorable missingness mechanism. Clustering is achieved by a semi-parametric mixture model and missingness is managed by using the pattern-mixture approach. More details of the approach are available in Du Roy de Chaumaray et al. (2020) <doi:10.48550/arXiv.2009.07662>. |
Maintainer: | Matthieu Marbac <matthieu.marbac-lourdelle@ensai.fr> |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Imports: | Rcpp, parallel, sn, rmutil |
LinkingTo: | Rcpp, RcppArmadillo |
ByteCompile: | true |
URL: | https://arxiv.org/abs/2009.07662 |
Author: | Marie Du Roy de Chaumaray [aut], Matthieu Marbac [aut, cre, cph] |
Collate: | algoCat.R algoCont.R algoMixed.R MNARclust.R RcppExports.R sampler.R tools.R |
LazyData: | true |
LazyLoad: | yes |
Encoding: | UTF-8 |
Depends: | R (≥ 3.5) |
RoxygenNote: | 7.1.0 |
NeedsCompilation: | yes |
Packaged: | 2021-12-02 13:04:54 UTC; matt |
Repository: | CRAN |
Date/Publication: | 2021-12-02 13:20:10 UTC |
MNARclust.
Description
Clustering method to analyze continuous or mixed-type data with missingness. The missingness mechanism can be non ignorable. The approach considers a semi-parametric mixture model.
Details
Package: | MNARclust |
Type: | Package |
Version: | 1.1.0 |
Date: | 2021-12-01 |
License: | GPL-3 |
LazyLoad: | yes |
Clustering function
Description
Clustering method to analyze continuous or mixed-type data with missingness. The missingness mechanism can be non ignorable. The approach considers a semi-parametric mixture model.
Usage
MNARcluster(
x,
K,
nbinit = 20,
nbCPU = 1,
tol = 0.01,
band = band.default(x),
seedvalue = 123
)
Arguments
x |
matrix used for clustering |
K |
number of components |
nbinit |
number of random starting points |
nbCPU |
number of CPU used for parallel computing (only Unix and Linux systems are allowed) |
tol |
stopping rule |
band |
bandwidth (numeric vector). |
seedvalue |
value of the seed (used to set the initializations of the MM algorithm) |
Value
Returns a list containing the proportions (proportions), matrix of probabilities of missngness (rho), the posterior probabilities of classification (classproba), the partition (zhat) and the logarithme of the smoothed-likelihood (logSmoothlike)
References
Clustering Data with Non-Ignorable Missingness using Semi-Parametric Mixture Models, Marie Du Roy de Chaumaray and Matthieu Marbac <arXiv:2009.07662>.
Examples
set.seed(123)
# Data generation
ech <- rMNAR(n=100, K=2, d=4, delta=2, gamma=2)
# Clustering
res <- MNARcluster(ech$x, K=2)
# Confusion matrix between the estimated and the true partiion
table(res$zhat, ech$z)
Echocardiogram data set
Description
All the patients suffered heart attacks at some point in the past. Some are still alive and some are not. The survival and still-alive variables, when taken together, indicate whether a patient survived for at least one year following the heart attack.
Format
A data frame with 132 observations on 13 variables (more details on this data set are presented in http://archive.ics.uci.edu/ml/datasets/Echocardiogram).
Details
This data set arise from the UCI machine learning repository (more details on this data set are presented http://archive.ics.uci.edu/ml/datasets/Echocardiogram)
References
Salzberg, S. (1988). Exemplar-based learning: Theory and implementation (Technical Report TR-10-88). Harvard University, Center for Research in Computing Technology, Aiken Computation Laboratory (33 Oxford Street; Cambridge, MA 02138).
Examples
data(echo)
Function used to simulate data from mixture model with specific missingness mechanism
Description
Generation of data set to perform the simulation presented in Section 4.1 of Du Roy de Chaumaray (2020)
Usage
rMNAR(
n,
K,
d = 3,
delta = 3,
gamma = 1,
law = "gauss",
linkmissing = "logit-X"
)
Arguments
n |
sample size (numeric of length 1) |
K |
number of clusters (numeric of length 1) |
d |
number of variables (numeric of length 1) |
delta |
tuning parameter to define the rate of misclassification (numeric of length 1) |
gamma |
tuning parameter to define the rate of missingness (numeric of length 1) |
law |
specifies the distribution of the variables within components (character that must be equal to gauss, student, laplace or skewgauss) |
linkmissing |
specify the missingness mechanism (character that must be equal to MCAR, logit-Z, logit-X or censoring) |
Value
rMNAR returns a list containing the observed data (x), the true cluster membership (z), the complete data (xfull), the cluster membership given by the Baye's rule (zhat), the empirical rates of misclassification (meanerrorclass) and missngness (meanmiss).
References
Clustering Data with Non-Ignorable Missingness using Semi-Parametric Mixture Models, Marie Du Roy de Chaumaray and Matthieu Marbac <arXiv:2009.07662>.
Examples
set.seed(123)
# Data generation
ech <- rMNAR(n=100, K=3, d=3, delta=2, gamma=1)
# Head of the observed data
head(ech$x)
# Table of the cluster memberships
table(ech$z)
# Empirical rate of misclassification
ech$meanerrorclass
# Empirical rate of missingness
ech$meanmiss