Version: | 3.8 |
Date: | 2024-09-16 |
Title: | Bayesian Survival Regression with Flexible Error and Random Effects Distributions |
Depends: | R (≥ 3.0.0), survival, coda, smoothSurv |
Imports: | graphics, stats, utils |
Description: | Contains Bayesian implementations of the Mixed-Effects Accelerated Failure Time (MEAFT) models for censored data. Those can be not only right-censored but also interval-censored, doubly-interval-censored or misclassified interval-censored. The methods implemented in the package have been published in Komárek and Lesaffre (2006, Stat. Modelling) <doi:10.1191/1471082X06st107oa>, Komárek, Lesaffre and Legrand (2007, Stat. in Medicine) <doi:10.1002/sim.3083>, Komárek and Lesaffre (2007, Stat. Sinica) https://www3.stat.sinica.edu.tw/statistica/oldpdf/A17n27.pdf, Komárek and Lesaffre (2008, JASA) <doi:10.1198/016214507000000563>, García-Zattera, Jara and Komárek (2016, Biometrics) <doi:10.1111/biom.12424>. |
Encoding: | UTF-8 |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
URL: | https://msekce.karlin.mff.cuni.cz/~komarek/ |
ZipData: | no |
NeedsCompilation: | yes |
Packaged: | 2024-09-16 11:45:15 UTC; komarek |
Author: | Arnošt Komárek |
Maintainer: | Arnošt Komárek <arnost.komarek@mff.cuni.cz> |
Repository: | CRAN |
Date/Publication: | 2024-09-16 12:40:02 UTC |
Population-averaged accelerated failure time model for bivariate, possibly doubly-interval-censored data. The error distribution is expressed as a penalized bivariate normal mixture with high number of components (bivariate G-spline).
Description
A function to estimate a regression model with bivariate (possibly right-, left-, interval- or doubly-interval-censored) data. In the case of doubly interval censoring, different regression models can be specified for the onset and event times.
The error density of the regression model is specified as a mixture of Bayesian G-splines (normal densities with equidistant means and constant variance matrices). This function performs an MCMC sampling from the posterior distribution of unknown quantities.
For details, see Komárek (2006) and Komárek and Lesaffre (2006).
We explain first in more detail a model without doubly censoring.
Let T_{i,l},\; i=1,\dots, N,\; l=1, 2
be event times for i
th cluster and the first and the second
unit. The following regression model is assumed:
\log(T_{i,l}) = \beta'x_{i,l} + \varepsilon_{i,l},\quad i=1,\dots,N,\;l=1,2
where \beta
is unknown regression parameter vector and
x_{i,l}
is a vector of covariates. The bivariate error terms
\varepsilon_i=(\varepsilon_{i,1},\,\varepsilon_{i,2})',\;i=1,\dots,N
are assumed to be i.i.d. with a bivariate density
g_{\varepsilon}(e_1,\,e_2)
. This density is expressed as
a mixture of Bayesian G-splines (normal densities with equidistant
means and constant variance matrices). We distinguish two,
theoretically equivalent, specifications.
- Specification 1
-
(\varepsilon_1,\,\varepsilon_2)' \sim \sum_{j_1=-K_1}^{K_1}\sum_{j_2=-K_2}^{K_2} w_{j_1,j_2} N_2(\mu_{(j_1,j_2)},\,\mbox{diag}(\sigma_1^2,\,\sigma_2^2))
where
\sigma_1^2,\,\sigma_2^2
are unknown basis variances and\mu_{(j_1,j_2)} = (\mu_{1,j_1},\,\mu_{2,j_2})'
is an equidistant grid of knots symmetric around the unknown point(\gamma_1,\,\gamma_2)'
and related to the unknown basis variances through the relationship\mu_{1,j_1} = \gamma_1 + j_1\delta_1\sigma_1,\quad j_1=-K_1,\dots,K_1,
\mu_{2,j_2} = \gamma_2 + j_2\delta_2\sigma_2,\quad j_2=-K_2,\dots,K_2,
where
\delta_1,\,\delta_2
are fixed constants, e.g.\delta_1=\delta_2=2/3
(which has a justification of being close to cubic B-splines). - Specification 2
-
(\varepsilon_1,\,\varepsilon_2)' \sim (\alpha_1,\,\alpha_2)'+ \bold{S}\,(V_1,\,V_2)'
where
(\alpha_1,\,\alpha_2)'
is an unknown intercept term and\bold{S} \mbox{ is a diagonal matrix with } \tau_1 \mbox{ and }\tau_2 \mbox{ on a diagonal,}
i.e.\tau_1,\,\tau_2
are unknown scale parameters.(V_1,\,V_2)')
is then standardized bivariate error term which is distributed according to the bivariate normal mixture, i.e.(V_1,\,V_2)'\sim \sum_{j_1=-K_1}^{K_1}\sum_{j_2=-K_2}^{K_2} w_{j_1,j_2} N_2(\mu_{(j_1,j_2)},\,\mbox{diag}(\sigma_1^2, \sigma_2^2))
where
\mu_{(j_1,j_2)} = (\mu_{1,j_1},\,\mu_{2,j_2})'
is an equidistant grid of fixed knots (means), usually symmetric about the fixed point(\gamma_1,\,\gamma_2)'=(0, 0)'
and\sigma_1^2,\,\sigma_2^2
are fixed basis variances. Reasonable values for the numbers of grid pointsK_1
andK_2
areK_1=K_2=15
with the distance between the two knots equal to\delta=0.3
and for the basis variances\sigma_1^2\sigma_2^2=0.2^2.
Personally, I found Specification 2 performing better. In the paper Komárek and Lesaffre (2006) only Specification 2 is described.
The mixture weights
w_{j_1,j_2},\;j_1=-K_1,\dots, K_1,\;j_2=-K_2,\dots,
K_2
are
not estimated directly. To avoid the constraints
0 < w_{j_1,j_2} < 1
and
\sum_{j_1=-K_1}^{K_1}\sum_{j_2=-K_2}^{K_2}w_{j_1,j_2} =
1
transformed weights a_{j_1,j_2},\;j_1=-K_1,\dots, K_1,\;j_2=-K_2,\dots,
K_2
related to the original weights by the logistic transformation:
a_{j_1,j_2} =
\frac{\exp(w_{j_1,j_2})}{\sum_{m_1}\sum_{m_2}\exp(w_{m_1,m_2})}
are estimated instead.
A Bayesian model is set up for all unknown parameters. For more details I refer to Komárek and Lesaffre (2006) and to Komárek (2006).
If there are doubly-censored data the model of the same type as above can be specified for both the onset time and the time-to-event.
Usage
bayesBisurvreg(formula, formula2, data = parent.frame(),
na.action = na.fail, onlyX = FALSE,
nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10),
prior, prior.beta, init = list(iter = 0),
mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1),
prior2, prior.beta2, init2,
mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1),
store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE,
r = FALSE, r2 = FALSE),
dir)
Arguments
formula |
model formula for the regression. In the case of
doubly-censored data, this is the model formula for the onset
time. Data are assumed to be sorted according to subjects and within
subjects according to the types of the events that determine the
bivariate survival distribution, i.e. the response vector must be
The left-hand side of the formula must be an object created using
|
formula2 |
model formula for the regression of the time-to-event in
the case of doubly-censored data. Ignored otherwise. The same remark as
for |
data |
optional data frame in which to interpret the variables occuring in the formulas. |
na.action |
the user is discouraged from changing the default
value |
onlyX |
if |
nsimul |
a list giving the number of iterations of the MCMC and other parameters of the simulation.
|
prior |
a list specifying the prior distribution of the G-spline
defining the distribution of the error term in the regression model
given by |
prior.beta |
prior specification for the regression parameters,
in the case of doubly censored data for the regression parameters of
the onset time. I.e. it is related to This should be a list with the following components:
It is recommended to run the function
bayesBisurvreg first with its argument |
init |
an optional list with initial values for the MCMC related
to the model given by
and
|
mcmc.par |
a list specifying how some of the G-spline parameters
related to
|
prior2 |
a list specifying the prior distribution of the G-spline
defining the distribution of the error term in the regression model
given by |
prior.beta2 |
prior specification for the regression parameters
of time-to-event in the case of doubly censored data (related to
|
init2 |
an optional list with initial values for the MCMC related
to the model given by |
mcmc.par2 |
a list specifying how some of the G-spline parameters
related to |
store |
a list of logical values specifying which chains that are not stored by default are to be stored. The list can have the following components.
|
dir |
a string that specifies a directory where all sampled values are to be stored. |
Value
A list of class bayesBisurvreg
containing an information
concerning the initial values and prior choices.
Files created
Additionally, the following files with sampled values
are stored in a directory specified by dir
argument of this
function (some of them are created only on request, see store
parameter of this function).
Headers are written to all files created by default and to files asked
by the user via the argument store
. During the burn-in, only
every nsimul$nwrite
value is written. After the burn-in, all
sampled values are written in files created by default and to files
asked by the user via the argument store
. In the files for
which the corresponding store
component is FALSE
, every
nsimul$nwrite
value is written during the whole MCMC (this
might be useful to restart the MCMC from some specific point).
The following files are created:
- iteration.sim
one column labeled
iteration
with indeces of MCMC iterations to which the stored sampled values correspond.- mixmoment.sim
columns labeled
k
,Mean.1
,Mean.2
,D.1.1
,D.2.1
,D.2.2
, wherek = number of mixture components that had probability numerically higher than zero;
Mean.1 =
\mbox{E}(\varepsilon_{i,1})
;Mean.2 =
\mbox{E}(\varepsilon_{i,2})
;D.1.1 =
\mbox{var}(\varepsilon_{i,1})
;D.2.1 =
\mbox{cov}(\varepsilon_{i,1},\,\varepsilon_{i,2})
;D.2.2 =
\mbox{var}(\varepsilon_{i,2})
;all related to the distribution of the error term from the model given by
formula
.- mixmoment_2.sim
in the case of doubly-censored data, the same structure as
mixmoment.sim
, however related to the model given byformula2
.- mweight.sim
sampled mixture weights
w_{k_1,\,k_2}
of mixture components that had probabilities numerically higher than zero. Related to the model given byformula
.- mweight_2.sim
in the case of doubly-censored data, the same structure as
mweight.sim
, however related to the model given byformula2
.- mmean.sim
indeces
k_1,\;k_2,
k_1 \in\{-K_1, \dots, K_1\},
k_2 \in\{-K_2, \dots, K_2\}
of mixture components that had probabilities numerically higher than zero. It corresponds to the weights inmweight.sim
. Related to the model given byformula
.- mmean_2.sim
in the case of doubly-censored data, the same structure as
mmean.sim
, however related to the model given byformula2
.- gspline.sim
characteristics of the sampled G-spline (distribution of
(\varepsilon_{i,1},\,\varepsilon_{i,2})'
) related to the model given byformula
. This file together withmixmoment.sim
,mweight.sim
andmmean.sim
can be used to reconstruct the G-spline in each MCMC iteration.The file has columns labeled
gamma1
,gamma2
,sigma1
,sigma2
,delta1
,delta2
,intercept1
,intercept2
,scale1
,scale2
. The meaning of the values in these columns is the following:gamma1 = the middle knot
\gamma_1
in the first dimension. If ‘Specification’ is 2, this column usually contains zeros;gamma2 = the middle knot
\gamma_2
in the second dimension. If ‘Specification’ is 2, this column usually contains zeros;sigma1 = basis standard deviation
\sigma_1
of the G-spline in the first dimension. This column contains a fixed value if ‘Specification’ is 2;sigma2 = basis standard deviation
\sigma_2
of the G-spline in the second dimension. This column contains a fixed value if ‘Specification’ is 2;delta1 = distance
delta_1
between the two knots of the G-spline in the first dimension. This column contains a fixed value if ‘Specification’ is 2;delta2 = distance
\delta_2
between the two knots of the G-spline in the second dimension. This column contains a fixed value if ‘Specification’ is 2;intercept1 = the intercept term
\alpha_1
of the G-spline in the first dimension. If ‘Specification’ is 1, this column usually contains zeros;intercept2 = the intercept term
\alpha_2
of the G-spline in the second dimension. If ‘Specification’ is 1, this column usually contains zeros;scale1 = the scale parameter
\tau_1
of the G-spline in the first dimension. If ‘Specification’ is 1, this column usually contains ones;scale2 = the scale parameter
\tau_2
of the G-spline in the second dimension. ‘Specification’ is 1, this column usually contains ones.- gspline_2.sim
in the case of doubly-censored data, the same structure as
gspline.sim
, however related to the model given byformula2
.- mlogweight.sim
fully created only if
store$a = TRUE
. The file contains the transformed weightsa_{k_1,\,k_2},
k_1=-K_1,\dots,K_1,
k_2=-K_2,\dots,K_2
of all mixture components, i.e. also of components that had numerically zero probabilities. This file is related to the model given byformula
.- mlogweight_2.sim
fully created only if
store$a2 = TRUE
and in the case of doubly-censored data, the same structure asmlogweight.sim
, however related to the model given byformula2
.- r.sim
fully created only if
store$r = TRUE
. The file contains the labels of the mixture components into which the residuals are intrinsically assigned. Instead of double indeces(k_1,\,k_2)
, values from 1 to(2\,K_1+1)\times (2\,K_2+1)
are stored here. Functionvecr2matr
can be used to transform it back to double indeces.- r_2.sim
fully created only if
store$r2 = TRUE
and in the case of doubly-censored data, the same structure asr.sim
, however related to the model given byformula2
.- lambda.sim
either one column labeled
lambda
or two columns labeledlambda1
andlambda2
. These are the values of the smoothing parameter(s)\lambda
(hyperparameters of the prior distribution of the transformed mixture weightsa_{k_1,\,k_2}
). This file is related to the model given byformula
.- lambda_2.sim
in the case of doubly-censored data, the same structure as
lambda.sim
, however related to the model given byformula2
.- beta.sim
sampled values of the regression parameters
\beta
related to the model given byformula
. The columns are labeled according to thecolnames
of the design matrix.- beta_2.sim
in the case of doubly-censored data, the same structure as
beta.sim
, however related to the model given byformula2
.- Y.sim
fully created only if
store$y = TRUE
. It contains sampled (augmented) log-event times for all observations in the data set.- Y_2.sim
fully created only if
store$y2 = TRUE
and in the case of doubly-censored data, the same structure asY.sim
, however related to the model given byformula2
.- logposter.sim
columns labeled
loglik
,penalty
orpenalty1
andpenalty2
,logprw
. This file is related to the model given byformula
. The columns have the following meaning.loglik
=
% -N\Bigl\{\log(2\pi) + \log(\sigma_1) + \log(\sigma_2)\Bigr\}- 0.5\sum_{i=1}^N\Bigl\{ (\sigma_1^2\,\tau_1^2)^{-1}\; (y_{i,1} - x_{i,1}'\beta - \alpha_1 - \tau_1\mu_{1,\,r_{i,1}})^2 + (\sigma_2^2\,\tau_2^2)^{-1}\; (y_{i,2} - x_{i,2}'\beta - \alpha_2 - \tau_2\mu_{2,\,r_{i,2}})^2 \Bigr\}
where
y_{i,l}
denotes (augmented) (i,l)th true log-event time. In other words,loglik
is equal to the conditional log-density\sum_{i=1}^N\,\log\Bigl\{p\bigl((y_{i,1},\,y_{i,2})\;\big|\;r_{i},\,\beta,\,\mbox{G-spline}\bigr)\Bigr\};
penalty1: If
prior$neighbor.system
="uniCAR"
: the penalty term for the first dimension not multiplied bylambda1
;penalty2: If
prior$neighbor.system
="uniCAR"
: the penalty term for the second dimension not multiplied bylambda2
;penalty: If
prior$neighbor.system
is different from"uniCAR"
: the penalty term not multiplied bylambda
;logprw
=
-2\,N\,\log\bigl\{\sum_{k_1}\sum_{k_2}a_{k_1,\,k_2}\bigr\} + \sum_{k_1}\sum_{k_2}N_{k_1,\,k_2}\,a_{k_1,\,k_2},
whereN_{k_1,\,k_2}
is the number of residuals assigned intrinsincally to the(k_1,\,k_2)
th mixture component.In other words,
logprw
is equal to the conditional log-density\sum_{i=1}^N \log\bigl\{p(r_i\;|\;\mbox{G-spline weights})\bigr\}.
- logposter_2.sim
in the case of doubly-censored data, the same structure as
lambda.sim
, however related to the model given byformula2
.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Gilks, W. R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41, 337 - 348.
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2006). Bayesian semi-parametric accelerated failure time model for paired doubly interval-censored data. Statistical Modelling, 6, 3 - 22.
Neal, R. M. (2003). Slice sampling (with Discussion). The Annals of Statistics, 31, 705 - 767.
Examples
## See the description of R commands for
## the population averaged AFT model
## with the Signal Tandmobiel data,
## analysis described in Komarek and Lesaffre (2006),
##
## R commands available in the documentation
## directory of this package as
## - see ex-tandmobPA.R and
## https://www2.karlin.mff.cuni.cz/ komarek/software/bayesSurv/ex-tandmobPA.pdf
##
Helping function for Bayesian regression with smoothed bivariate densities as the error term, based on possibly censored data
Description
These functions are not to be called by ordinary users.
These are just sub-parts of ‘bayesBisurvreg’ function to make it more readable for the programmer.
Usage
bayesBisurvreg.checkStore(store)
bayesBisurvreg.priorInit(dim, prior, init, design, mcmc.par,
prior2, init2, design2, mcmc.par2,
doubly)
bayesBisurvreg.priorBeta(prior.beta, init, design)
bayesBisurvreg.writeHeaders(dir, dim, nP, doubly, prior.init, store,
design, design2)
Arguments
store |
a list as required by the argument |
dim |
dimension of the response, 1 or 2 |
prior |
a list as required by the argument |
prior2 |
a list as required by the argument |
init |
a list as required by the argument |
init2 |
a list as required by the argument |
mcmc.par |
a list as required by the argument |
mcmc.par2 |
a list as required by the argument |
design |
an object as returned by the function
|
design2 |
an object as returned by the function
|
doubly |
logical indicating whether the response is doubly censored or not |
prior.beta |
a list as required by the argument |
dir |
path to the directory where the sampled values are to be stored |
nP |
sample size - number of observations if the univariate model is fitted, number of bivariate observational vectors if the bivariate model is fitted |
prior.init |
a list as returned by the function
|
Value
Some lists.
Value for bayesBisurvreg.priorInit
A~list with the following components:
- Gparmi
integer arguments for the G-spline constructor in the C++ code related to the onset/event time
- Gparmd
double arguments for the G-spline constructor in the C++ code related to the onset/event time
- y
vector of initial values for the log(onset time)/log(event time), sorted as
y_1[1], y_1[2], \dots, y_n[1], y_n[2]
in the case of bivariate response with sample size equal ton
- r
initial component labels (vector of size
n
) taking values from 1 to the total length of the G-spline related to the onset/event time- Gparmi2
integer arguments for the G-spline constructor in the C++ code related to time-to-event in the case of doubly censoring
- Gparmd2
double arguments for the G-spline constructor in the C++ code related to time-to-event in the case of doubly censoring
- y2
vector of initial values for the time-to-event in the case of doubly censoring sorted as
y_1[1], y_1[2], \dots, y_n[1], y_n[2]
in the case of bivariate response with sample size equal to
n
- r2
initial component labels (vector of size
n
) taking values from 1 to the total length of the G-spline related to time-to-event in the case of doubly censoring- iter
index of the nullth iteration
- specification
2 component vector (one component for onset, one for time-to-event), specification of the G-spline model (1 or 2), see
bayesHistogram
for more detail- y.left
lower limit of the log-response (or exact/right/left censored observation) as required by the C++ function
bayesBisurvreg
, related to the onset time in the case of doubly censoring and to the event time otherwise- y.right
upper limit of the log-response as required by the C++ function
bayesBisurvreg
, related to the onset time in the case of doubly censoring and to the event time otherwise- status
status vector as required by the C++ function
bayesBisurvreg
related to the onset time in the case of doubly censoring and to the event time otherwise- t2.left
lower limit of the response as required by the C++ function
bayesBisurvreg
, related to time-to-event in the case of doubly censoring, equal to 0 if there is no doubly-censoring- t2.right
upper limit of the response as required by the C++ function
bayesBisurvreg
, related to time-to-event in the case of doubly censoring, equal to 0 if there is no doubly-censoring- status2
status vector related to time-to-event in the case of doubly censoring, equal to 0 otherwise.
and the following attributes:
init |
prior |
mcmc.par |
init2 |
prior2 |
mcmc.par2 |
Value for bayesBisurvreg.priorBeta
A list with the following components:
- parmI
integer arguments for C++
classBetaGamma
constructor- parmD
double arguments for C++
classBetaGamma
constructor
and the following attributes:
- init
a~vector with initial values of the beta parameter, equal to
numeric(0)
if there are no regressors- prior.beta
a~list with components
mean.prior
andvar.prior
containing vectors with the prior mean and prior variance of thebeta
parameters
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Summary for the density estimate based on the mixture Bayesian AFT model.
Description
Function to summarize the results obtained using
bayessurvreg1
function.
Compute the conditional (given the number of mixture components) and unconditional estimate of the density function based on the values sampled using the reversible jumps MCMC (MCMC average evaluated in a grid of values).
Give also the values of each sampled density evaluated at that grid (returned as the attribute of the resulting object). Methods for printing and plotting are also provided.
Usage
bayesDensity(dir, stgrid, centgrid, grid, n.grid = 100,
skip = 0, by = 1, last.iter,
standard = TRUE, center = TRUE, unstandard = TRUE)
Arguments
dir |
directory where to search for files ‘mixmoment.sim’, ‘mweight.sim’, mmean.sim', ‘mvariance.sim’ with the McMC sample. |
stgrid |
grid of values at which the sampled standardized
densities are to be evaluated. If |
centgrid |
grid of values at which the sampled centered (but not
scaled) densities are to be evaluated. If |
grid |
grid of values at which the sampled densities are to be
evaluated. If |
n.grid |
the length of the grid if |
skip |
number of rows that should be skipped at the beginning of each *.sim file with the stored sample. |
by |
additional thinning of the sample. |
last.iter |
index of the last row from *.sim files that should be
used. If not specified than it is set to the maximum available
determined according to the file |
standard |
if |
center |
if |
unstandard |
of |
Value
An object of class bayesDensity
is returned. This object is a
list and has potentially three components: standard
,
center
and
unstandard
. Each of these three components is a data.frame
with as many rows as number of grid points at which the density was
evaluated and with columns called ‘grid’, ‘unconditional’ and ‘k = 1’,
..., ‘k = k.max’ giving a predictive errr density, either averaged
over all sampled k
s (unconditional) or averaged over a
psecific number of mixture components.
Additionally, the object of class bayesDensity
has three
attributes:
sample.size |
a vector of length |
moments |
a |
k |
a |
There exist methods to print and plot objects of the class bayesDensity
.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2007). Bayesian accelerated failure time model for correlated interval-censored data with a normal mixture as an error distribution. Statistica Sinica, 17, 549–569.
Examples
## See the description of R commands for
## the models described in
## Komarek (2006),
## Komarek and Lesaffre (2007),
##
## R commands available
## in the documentation
## directory of this package
## - ex-cgd.R and
## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-cgd.pdf
##
## - ex-tandmobMixture.R and
## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobMixture.pdf
##
Summary for the density estimate based on the model with Bayesian G-splines.
Description
Compute the estimate of the density function based on the values sampled using the MCMC (MCMC average evaluated in a grid of values) in a model where density is specified as a Bayesian G-spline.
This function serves to summarize the MCMC chains related to the distributional parts
of the considered models obtained using the functions:
bayesHistogram
,
bayesBisurvreg
, bayessurvreg2
, bayessurvreg3
.
If asked, this function returns also the values of the G-spline evaluated in a grid at each iteration of MCMC.
Usage
bayesGspline(dir, extens="", extens.adjust="_b",
grid1, grid2, skip = 0, by = 1, last.iter, nwrite,
only.aver = TRUE, standard = FALSE, version = 0)
Arguments
dir |
directory where to search for files (‘mixmoment.sim’, ‘mweight.sim’, ‘mmean.sim’, ‘gspline.sim’) with the MCMC sample. | ||
extens |
an extension used to distinguish different sampled
G-splines if more G-splines were used in one simulation (e.g. with
doubly-censored data or in the model where both the error term and the
random intercept were defined as the G-splines). According to which
| ||
extens.adjust |
this argument is applicable for the situation when
the MCMC chains were created using the function
In that case the location of the error term and the random intercept
are separately not identifiable. Only the location of the sum
Argument The following values of
| ||
grid1 |
grid of values from the first dimension at which the sampled densities are to be evaluated. | ||
grid2 |
grid of values from the second dimension (if the G-spline
was bivariate) at which the sampled densities are to be
evaluated. This item is | ||
skip |
number of rows that should be skipped at the beginning of each *.sim file with the stored sample. | ||
by |
additional thinning of the sample. | ||
last.iter |
index of the last row from *.sim files that should be
used. If not specified than it is set to the maximum available
determined according to the file | ||
nwrite |
frequency with which is the user informed about the
progress of computation (every | ||
only.aver |
| ||
standard |
| ||
version |
this argument indicates by which
|
Value
An object of class bayesGspline
is returned. This object is a
list with components
grid
, average
for the univariate G-spline and
components grid1
, grid2
, average
for the bivariate G-spline.
grid |
this is a grid of values (vector) at which the McMC average of the G-spline was computed. | ||||||||||||||||||
average |
these are McMC averages of the G-spline (vector) evaluated in
| ||||||||||||||||||
grid1 |
this is a grid of values (vector) for the first dimension at which the McMC average of the G-spline was computed. | ||||||||||||||||||
grid2 |
this is a grid of values (vector) for the second dimension at which the McMC average of the G-spline was computed. | ||||||||||||||||||
average |
this is a matrix
and
|
There exists a method to plot objects of the class bayesGspline
.
Attributes
Additionally, the object of class bayesGspline
has the following
attributes:
sample.size
a length of the McMC sample used to compute the McMC average.
sample
G-spline evaluated in a grid of values. This attribute is present only if
only.aver = FALSE
.For a univariate G-spline this is a matrix with
sample.size
columns and length(grid1) rows.For a bivariate G-spline this is a matrix with
sample.size
columns and length(grid1)*length(grid2) rows.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2006). Bayesian semi-parametric accelerated failurew time model for paired doubly interval-censored data. Statistical Modelling, 6, 3–22.
Komárek, A. and Lesaffre, E. (2008). Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association, 103, 523–533.
Komárek, A., Lesaffre, E., and Legrand, C. (2007). Baseline and treatment effect heterogeneity for survival times between centers using a random effects accelerated failure time model with flexible error distribution. Statistics in Medicine, 26, 5457–5472.
Examples
## See the description of R commands for
## the models described in
## Komarek (2006),
## Komarek and Lesaffre (2006),
## Komarek and Lesaffre (2008),
## Komarek, Lesaffre, and Legrand (2007).
##
## R commands available
## in the documentation
## directory of this package
## - ex-tandmobPA.R and
## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobPA.pdf
## - ex-tandmobCS.R and
## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobCS.pdf
## - ex-eortc.R and
## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-eortc.pdf
##
Smoothing of a uni- or bivariate histogram using Bayesian G-splines
Description
A function to estimate a density of a uni- or bivariate (possibly censored) sample. The density is specified as a mixture of Bayesian G-splines (normal densities with equidistant means and equal variances). This function performs an MCMC sampling from the posterior distribution of unknown quantities in the density specification. Other method functions are available to visualize resulting density estimate.
This function served as a basis for further developed
bayesBisurvreg
, bayessurvreg2
and
bayessurvreg3
functions. However, in contrast to these
functions, bayesHistogram
does not allow for doubly censoring.
Bivariate case:
Let Y_{i,l},\; i=1,\dots,N,\; l=1,2
be
observations for the i
th cluster and the first and the second
unit (dimension). The bivariate observations
Y_i=(Y_{i,1},\,Y_{i,2})',\;i=1,\dots,N
are assumed to be i.i.d. with a~bivariate density
g_{y}(y_1,\,y_2)
. This density is expressed as
a~mixture of Bayesian G-splines (normal densities with equidistant
means and constant variance matrices). We distinguish two,
theoretically equivalent, specifications.
- Specification 1
-
(Y_1,\,Y_2)' \sim \sum_{j_1=-K_1}^{K_1}\sum_{j_2=-K_2}^{K_2} w_{j_1,j_2} N_2(\mu_{(j_1,j_2)},\,\mbox{diag}(\sigma_1^2,\,\sigma_2^2))
where
\sigma_1^2,\,\sigma_2^2
are unknown basis variances and\mu_{(j_1,j_2)} = (\mu_{1,j_1},\,\mu_{2,j_2})'
is an~equidistant grid of knots symmetric around the unknown point(\gamma_1,\,\gamma_2)'
and related to the unknown basis variances through the relationship\mu_{1,j_1} = \gamma_1 + j_1\delta_1\sigma_1,\quad j_1=-K_1,\dots,K_1,
\mu_{2,j_2} = \gamma_2 + j_2\delta_2\sigma_2,\quad j_2=-K_2,\dots,K_2,
where
\delta_1,\,\delta_2
are fixed constants, e.g.\delta_1=\delta_2=2/3
(which has a~justification of being close to cubic B-splines). - Specification 2
-
(Y_1,\,Y_2)' \sim (\alpha_1,\,\alpha_2)'+ \bold{S}\,(Y_1,\,Y_2)'
where
(\alpha_1,\,\alpha_2)'
is an unknown intercept term and\bold{S} \mbox{ is a diagonal matrix with } \tau_1 \mbox{ and }\tau_2 \mbox{ on a diagonal,}
i.e.\tau_1,\,\tau_2
are unknown scale parameters.(V_1,\,V_2)'
is then standardized observational vector which is distributed according to the bivariate normal mixture, i.e.(V_1,\,V_2)'\sim \sum_{j_1=-K_1}^{K_1}\sum_{j_2=-K_2}^{K_2} w_{j_1,j_2} N_2(\mu_{(j_1,j_2)},\,\mbox{diag}(\sigma_1^2, \sigma_2^2))
where
\mu_{(j_1,j_2)} = (\mu_{1,j_1},\,\mu_{2,j_2})'
is an~equidistant grid of fixed knots (means), usually symmetric about the fixed point(\gamma_1,\,\gamma_2)'=(0, 0)'
and\sigma_1^2,\,\sigma_2^2
are fixed basis variances. Reasonable values for the numbers of grid pointsK_1
andK_2
areK_1=K_2=15
with the distance between the two knots equal to\delta=0.3
and for the basis variances\sigma_1^2\sigma_2^2=0.2^2.
Univariate case:
It is a~direct simplification of the bivariate case.
Usage
bayesHistogram(y1, y2,
nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10),
prior, init = list(iter = 0),
mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1),
store = list(a = FALSE, y = FALSE, r = FALSE),
dir)
Arguments
y1 |
response for the first dimension in the form of a survival
object created using |
y2 |
response for the second dimension in the form of a survival
object created using |
nsimul |
a list giving the number of iterations of the MCMC and other parameters of the simulation.
|
prior |
a list that identifies prior hyperparameters and prior choices. See the paper Komárek and Lesaffre (2008) and the PhD. thesis Komárek (2006) for more details. Some prior parameters can be guessed by the function itself. If you
want to do so, set such parameters to
|
init |
a list of the initial values to start the McMC. Set to
|
mcmc.par |
a list specifying further details of the McMC simulation. There are default values implemented for all components of this list.
|
store |
a~list of logical values specifying which chains that are not stored by default are to be stored. The list can have the following components.
|
dir |
a string that specifies a directory where all sampled values are to be stored. |
Value
A list of class bayesHistogram
containing an information
concerning the initial values and prior choices.
Files created
Additionally, the following files with sampled values
are stored in a directory specified by dir
argument of this
function (some of them are created only on request, see store
parameter of this function).
Headers are written to all files created by default and to files asked
by the user via the argument store
. All
sampled values are written in files created by default and to files
asked by the user via the argument store
. In the files for
which the corresponding store
component is FALSE
, every
nsimul$nwrite
value is written during the whole MCMC (this
might be useful to restart the MCMC from some specific point).
The following files are created:
- iteration.sim
one column labeled
iteration
with indeces of MCMC iterations to which the stored sampled values correspond.- mixmoment.sim
columns labeled
k
,Mean.1
,Mean.2
,D.1.1
,D.2.1
,D.2.2
in the bivariate case and columns labeledk
,Mean.1
,D.1.1
in the univariate case, wherek = number of mixture components that had probability numerically higher than zero;
Mean.1 =
\mbox{E}(Y_{i,1})
;Mean.2 =
\mbox{E}(Y_{i,2})
;D.1.1 =
\mbox{var}(Y_{i,1})
;D.2.1 =
\mbox{cov}(Y_{i,1},\,Y_{i,2})
;D.2.2 =
\mbox{var}(Y_{i,2})
.- mweight.sim
sampled mixture weights
w_{k_1,\,k_2}
of mixture components that had probabilities numerically higher than zero.- mmean.sim
indeces
k_1,\;k_2,
k_1 \in\{-K_1, \dots, K_1\},
k_2 \in\{-K_2, \dots, K_2\}
of mixture components that had probabilities numerically higher than zero. It corresponds to the weights inmweight.sim
.- gspline.sim
characteristics of the sampled G-spline (distribution of
(Y_{i,1},\,Y_{i,2})'
). This file together withmixmoment.sim
,mweight.sim
andmmean.sim
can be used to reconstruct the G-spline in each MCMC iteration.The file has columns labeled
gamma1
,gamma2
,sigma1
,sigma2
,delta1
,delta2
,intercept1
,intercept2
,scale1
,scale2
. The meaning of the values in these columns is the following:gamma1 = the middle knot
\gamma_1
in the first dimension. If ‘Specification’ is 2, this column usually contains zeros;gamma2 = the middle knot
\gamma_2
in the second dimension. If ‘Specification’ is 2, this column usually contains zeros;sigma1 = basis standard deviation
\sigma_1
of the G-spline in the first dimension. This column contains a~fixed value if ‘Specification’ is 2;sigma2 = basis standard deviation
\sigma_2
of the G-spline in the second dimension. This column contains a~fixed value if ‘Specification’ is 2;delta1 = distance
delta_1
between the two knots of the G-spline in the first dimension. This column contains a~fixed value if ‘Specification’ is 2;delta2 = distance
\delta_2
between the two knots of the G-spline in the second dimension. This column contains a~fixed value if ‘Specification’ is 2;intercept1 = the intercept term
\alpha_1
of the G-spline in the first dimension. If ‘Specification’ is 1, this column usually contains zeros;intercept2 = the intercept term
\alpha_2
of the G-spline in the second dimension. If ‘Specification’ is 1, this column usually contains zeros;scale1 = the scale parameter
\tau_1
of the G-spline in the first dimension. If ‘Specification’ is 1, this column usually contains ones;scale2 = the scale parameter
\tau_2
of the G-spline in the second dimension. ‘Specification’ is 1, this column usually contains ones.- mlogweight.sim
fully created only if
store$a = TRUE
. The file contains the transformed weightsa_{k_1,\,k_2},
k_1=-K_1,\dots,K_1,
k_2=-K_2,\dots,K_2
of all mixture components, i.e. also of components that had numerically zero probabilities.- r.sim
fully created only if
store$r = TRUE
. The file contains the labels of the mixture components into which the observations are intrinsically assigned. Instead of double indeces(k_1,\,k_2)
, values from 1 to(2\,K_1+1)\times (2\,K_2+1)
are stored here. Functionvecr2matr
can be used to transform it back to double indeces.- lambda.sim
either one column labeled
lambda
or two columns labeledlambda1
andlambda2
. These are the values of the smoothing parameter(s)\lambda
(hyperparameters of the prior distribution of the transformed mixture weightsa_{k_1,\,k_2}
).- Y.sim
fully created only if
store$y = TRUE
. It contains sampled (augmented) log-event times for all observations in the data set.- logposter.sim
columns labeled
loglik
,penalty
orpenalty1
andpenalty2
,logprw
. The columns have the following meaning (the formulas apply for the bivariate case).loglik
=
% -N\Bigl\{\log(2\pi) + \log(\sigma_1) + \log(\sigma_2)\Bigr\}- 0.5\sum_{i=1}^N\Bigl\{ (\sigma_1^2\,\tau_1^2)^{-1}\; (y_{i,1} - \alpha_1 - \tau_1\mu_{1,\,r_{i,1}})^2 + (\sigma_2^2\,\tau_2^2)^{-1}\; (y_{i,2} - \alpha_2 - \tau_2\mu_{2,\,r_{i,2}})^2 \Bigr\}
where
y_{i,l}
denotes (augmented) (i,l)th true log-event time. In other words,loglik
is equal to the conditional log-density\sum_{i=1}^N\,\log\Bigl\{p\bigl((y_{i,1},\,y_{i,2})\;\big|\;r_{i},\,\mbox{G-spline}\bigr)\Bigr\};
penalty1: If
prior$neighbor.system
="uniCAR"
: the penalty term for the first dimension not multiplied bylambda1
;penalty2: If
prior$neighbor.system
="uniCAR"
: the penalty term for the second dimension not multiplied bylambda2
;penalty: If
prior$neighbor.system
is different from"uniCAR"
: the penalty term not multiplied bylambda
;logprw
=
-2\,N\,\log\bigl\{\sum_{k_1}\sum_{k_2}a_{k_1,\,k_2}\bigr\} + \sum_{k_1}\sum_{k_2}N_{k_1,\,k_2}\,a_{k_1,\,k_2},
whereN_{k_1,\,k_2}
is the number of observations assigned intrinsincally to the(k_1,\,k_2)
th mixture component.In other words,
logprw
is equal to the conditional log-density\sum_{i=1}^N \log\bigl\{p(r_i\;|\;\mbox{G-spline weights})\bigr\}.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Gilks, W. R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41, 337 - 348.
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2008). Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association, 103, 523 - 533.
Komárek, A. and Lesaffre, E. (2006b). Bayesian semi-parametric accelerated failurew time model for paired doubly interval-censored data. Statistical Modelling, 6, 3 - 22.
Neal, R. M. (2003). Slice sampling (with Discussion). The Annals of Statistics, 31, 705 - 767.
Helping function for Bayesian smoothing of (bi)-variate densities based on possibly censored data
Description
These functions are not to be called by ordinary users.
These are just sub-parts of ‘bayesHistogram’ function to make it more readable for the programmer.
Usage
bayesHistogram.design(y1, y2)
bayesHistogram.checkStore(store)
bayesHistogram.priorInit(prior, init, mcmc.par, design)
bayesHistogram.writeHeaders(dir, design, prior.init, store)
Arguments
y1 |
response for the first dimension. This should be a~survival object
created by |
y2 |
response object for the second dimension (if bivariate
density is to be smoothed). This should be a~survival object created
by |
store |
a~list with appropriate components |
prior |
a~list as required by |
init |
a~list as required by |
mcmc.par |
a~list as required by |
design |
a~list with the design information as returned by the
function
|
dir |
string giving a~directory where to store sampled files |
prior.init |
an~object as returned by |
Value
Some lists.
Value for bayesHistogram.design
A~list with the following components:
- y.left
vector or matrix with either observed, right or left censored observations or with the lower limits of interval censored observations. It is a vector if
dim == 1
and it is a matrix with 2 rows andn
columns ifdim == 2
, wheren
is a~sample size. In that case, the first row of the matrix are responses for the first dimension and the second row of the matrix are responses for the second dimension.- y.right
vector or matrix with entries equal to 1 for observed, right or left censored observations and entries equal to the upper limits of interval censored observations. The structure is the same as that of
y.left
.- status
a~vector or matrix with censoring indicators (1 = exactly observed, 0 = right censored, 2 = left censored, 3 = interval censored). The structure is the same as that of
y.left
.- dim
dimension of the response, i.e. 1 (univariate smoothing) or 2 (bivariate smoothing)
Value for bayesHistogram.priorInit
A~list with the following components:
- Gparmi
integer parameters for the G-spline constructor in the C++ code
- Gparmd
double parameters for the G-spline constructor in the C++ code
- iter
index of the nullth iteration
- y
vector of initial values for the response, sorted as
y_1[1], y_1[2], \dots, y_n[1], y_n[2]
in the case of bivariate response with sample size equal ton
- r
initial component labels (vector of size
n
) taking values from 1 to the total length of the G-spline- specification
specification of the G-spline model (1 or 2), see
bayesHistogram
for more detail
and the following attributes:
init |
prior |
mcmc.par |
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Helping function for Bayesian survival regression models.
Description
These functions are not to be called by ordinary users.
These are just sub-parts of ‘bayessurvreg’ functions to make them more readable for the programmer.
Usage
bayessurvreg.design(m, formula, random, data, transform, dtransform)
bayessurvreg.checknsimul(nsimul)
Arguments
m , formula , random , data , transform , dtransform |
|
nsimul |
a list |
Value
Some lists.
Value for bayessurvreg.design
A~list with the following components:
- n
number of observations (in the case of bivariate data, this is a~number of single observations, i.e.
2\times \mbox{sample size}
) included in the dataset- ncluster
number of clusters included in the dataset. In the case of bivariate data this is equal to the number of bivariate observations. If there are no random effects included in the model and if the observations are not bivariate then
ncluster = n
- nwithin
a~vector of length equal to
ncluster
with numbers of observations within each cluster. In the case of bivariate observations this is a~vector filled with 2's, if there are no random effects and if the observations are not bivariate then this is a~vector filled with 1's- nY
number of columns in the response matrix
Y
. This is equal to 2 if there are no interval-censored observations and equal to 3 if there is at least one interval censored observation in the dataset- nX
number of columns in the design matrix
X
. Note that the matrixX
contains covariates for both fixed and random effects- nfixed
number of fixed effects involved in the model. Note that possible intercept is always removed from the model
- nrandom
number of random effects in the model, possible random intercept included
- randomInt
TRUE
/FALSE
indicating whether the random intercept is included in the model- Y
response matrix. Its last column is always equal to the status indicator (1 for exactly observed event times, 0 for right-censored observations, 2 for left-censored observations, 3 for interval-censored observations).
- X
design matrix containing covariates
- Yinit
response matrix extracted from
formula
usingmodel.extract
- Xinit
design matrix extracted from
formula
usingmodel.matrix
function- cluster
a~vector of length
n
with identifications of clusters (as given bycluster
informula
)- indb
a~vector of length
nX
identifying fixed and random effects.indb[j] = -1
if thej
th column of matrixX
is a fixed effects. it is equal tol
if thej
th column of matrixX
corresponds to thel
th random effect (in C++ indexing)- rnames.X
row names of
Xinit
- names.random
column names of the
X
matrix corespning to the random effects. If there is the random intercept in the model, the first component of this vector is equal to "(Intercept)"- factors
???
- n.factors
number of factor covariates in the model formula
- n.in.factors
???
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
A Bayesian survival regression with an error distribution expressed as a~normal mixture with unknown number of components
Description
A function to sample from the posterior distribution for a survival regression model
\log(T_{i,l}) = \beta^T x_{i,l} + b_i^T z_{i,l} +
\varepsilon_{i,l},\quad i=1,\dots,N,\ l=1,\dots,n_i,
where distribution of \varepsilon_{i,l}
is specified
as a normal mixture with unknown number of components as in Richardson
and Green (1997) and random effect b_i
is normally distributed.
See Komárek (2006) or Komárek and Lesaffre (2007) for more detailed description of prior assumptions.
Sampled values are stored on a disk to be further worked out by e.g.
coda
or boa
.
Usage
bayessurvreg1(formula, random,
data = parent.frame(), subset,
na.action = na.fail,
x = FALSE, y = FALSE, onlyX = FALSE,
nsimul = list(niter = 10, nthin = 1, nburn = 0,
nnoadapt = 0, nwrite = 10),
prior = list(kmax = 5, k.prior = "poisson", poisson.k = 3,
dirichlet.w = 1,
mean.mu = NULL, var.mu = NULL,
shape.invsig2 = 1.5,
shape.hyper.invsig2 = 0.8, rate.hyper.invsig2 = NULL,
pi.split = NULL, pi.birth = NULL,
Eb0.depend.mix = FALSE),
prior.beta, prior.b, prop.revjump,
init = list(iter = 0, mixture = NULL, beta = NULL,
b = NULL, D = NULL,
y = NULL, r = NULL, otherp = NULL, u = NULL),
store = list(y = TRUE, r = TRUE, b = TRUE, u = TRUE,
MHb = FALSE, regresres = FALSE),
dir,
toler.chol = 1e-10, toler.qr = 1e-10, ...)
Arguments
formula |
model formula for the ‘fixed’ part of the model, i.e. the
part that specifies The left-hand side of the If
| ||
random |
formula for the ‘random’ part of the model, i.e. the
part that specifies When some random effects are included the random intercept is added
by default. It can be removed using e.g. | ||
data |
optional data frame in which to interpret the variables occuring in the formulas. | ||
subset |
subset of the observations to be used in the fit. | ||
na.action |
function to be used to handle any | ||
x |
if | ||
y |
if | ||
onlyX |
if TRUE, no McMC is performed. The function returns only
a design matrix of your model (intercept excluded). It might be
useful to set up correctly a parameter for a block update of
| ||
nsimul |
a list giving the number of iterations of the McMC and other parameters of the simulation.
| ||
prior |
a list that identifies prior hyperparameters and prior
choices. See accompanying paper for more details.
Some prior parameters can be guessed by the function itself. If you
want to do so, set such parameters to
| ||
prior.beta |
a list defining the blocks of
| ||
prior.b |
a list defining the way in which the random effects are to be updated and the specification of priors for random effects related parameters. The list is assumed to have following components.
| ||
prop.revjump |
a list of values defining in which way the reversible jumps will be performed.
| ||
init |
a list of the initial values to start the McMC. Set to
| ||
store |
a list that defines which sampled values besides
regression parameters
In the case that either | ||
dir |
a string that specifies a directory where all sampled values are to be stored. | ||
toler.chol |
tolerance for the Cholesky decomposition. | ||
toler.qr |
tolerance for the QR decomposition. | ||
... |
who knows? |
Value
A list of class bayessurvreg
containing an information
concerning the initial values and prior choices.
Files created
Additionally, the following files with sampled values
are stored in a directory specified by dir
parameter of this
function (some of them are created only on request, see store
parameter of this function).
- iteration.sim
one column labeled
iteration
with indeces of McMC iterations to which the stored sampled values correspond.- loglik.sim
two columns labeled
loglik
andrandomloglik
.\mbox{\code{loglik}} = \sum_{i=1}^{N}\sum_{l=1}^{n_i}\Biggl[ \biggl\{ \log\Bigl(\frac{1}{\sqrt{2\pi\sigma_{r_{i,l}}^2}}\Bigr) -\frac{(y_{i,l} - \beta^T x_{i,l} - b_i^T z_{i,l} - \mu_{r_{i,l}})^2}{2\sigma_{r_{i,l}}^2} \biggr\} \Biggr],
where
y_{i,l}
denotes (sampled) (i,l)th true log-event time,b_i
sampled value of the random effect vector for the ith cluster,\beta
sampled value of the regression parameter\beta
andk, w_j, \mu_j, \sigma_j^2, j = 1,\dots,k
sampled mixture at each iteration.\mbox{\code{randomloglik}} = \sum_{i=1}^{N}\log\Bigl(g(b_i)\Bigr),
where
g
denotes a density of (multivariate) normal distributionN(\gamma, D),
where\gamma
is a sampled value of the mean of random effect vector andD
is a sampled value of the covariance matrix of the random effects at each iteration.- mixmoment.sim
three columns labeled
k
,Intercept
andScale
. These are the number of mixture components, mean and standard deviation of the sampled error distribution (mixture) at each iteration.- mweight.sim
each row contains mixture weights
w_1,\dots,w_k
at each iteration. From the header of this file, maximal number of mixture components specified in the prior can be derived.- mmean.sim
each row contains mixture means
\mu_1,\dots,\mu_k
at each iteration. From the header of this file, maximal number of mixture components specified in the prior can be derived.- mvariance.sim
each row contains mixture variances
\sigma^2_1,\dots,\sigma^2_k
at each iteration. From the header of this file, maximal number of mixture components specified in the prior can be derived.- beta.sim
columns labeled according to name of the design matrix. These are sampled values of regression parameters
\beta
and means of random effects\gamma
(except the mean of the random intercept which is zero).- b.sim
columns labeled
nameb[1].id[1], ..., nameb[q].id[1], ..., nameb[1].id[N], ..., nameb[q].id[N]
, whereq
is a dimension of the random effect vectorb_i
andN
number of clusters.nameb
is replaced by appropriate column name from the design matrix andid
is replaced by identificator of the clusters. This gives sampled values of the random effects for each cluster.- D.sim
columns labeled
det, D.s.t, s = 1,..., q, t = s,...,q
, whereq
is dimension of the random effect vectorb_i
. Columndet
gives a determinant of the covariance matrixD
of the random effects at each iteration, remaining columns give a lower triangle of this matrix at each iteration.- Y.sim
columns labeled
Y[m]
wherem
goes from 1 to\sum_{i=1}^{N}n_i
. This gives sampled log-event times for each observation in the dataset at each iteration.- r.sim
columns labeled
r[m]
wherem
goes from 1 to\sum_{i=1}^{N}n_i
. This gives sampled mixture labels for each observation in the dataset at each iteration.- otherp.sim
Currently only one column labeled
eta
that gives sampled values of the hyperparameter\eta
.- MHinfo.sim
this gives the information concerning the performance of reversible jump algorithm and a sampler of regression parameters
\beta
and means of random effects\gamma
. It has columnsaccept.spl.comb
relative frequency of accepted split-combine moves up to that iteration.
split
relative frequency of proposed split moves up to that iteration.
accept.birth.death
relative frequency of accepted birth-death moves up to that iteration.
birth
relative frequency of proposed birth moves up to that iteration.
beta.block.m
with
m
going from 1 to number of defined blocks of beta parameters. This gives a relative frequency of accepted proposals for each block up to that iteration. When Gibbs move is used, these should be columns of ones.
- MHbinfo.sim
this gives the information concerning the performance of a sampler for random effects (relative frequency of accepted values for each cluster and each block of random effects updated together). When Gibbs move is used only ones are seen in this file.
- u.sim
Sampled values of canonical proposal variables for reversible jump algorithm are stored here. This file is useful only when trying to restart the simulation from some specific point.
- regresres.sim
columns labeled
res[m]
wherem
goes from 1 to\sum_{i=1}^{N}n_i
This stores so called regression residuals for each observation at each iteration. This residual is defined asres_{i,l} = y_{i,l} - \beta^T x_{i,l} - b_i z_{i,l},\qquad i=1\dots,N,\quad l=1,\dots,n_i,
where
y_{i,l}
is a (sampled) log-event time at each iteration.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2007). Bayesian accelerated failure time model for correlated interval-censored data with a normal mixture as an error distribution. Statistica Sinica, 17, 549 - 569.
Brooks, S. P., Giudici, P., and Roberts, G. O. (2003). Efficient construction of reversible jump Markov chain Monte Carlo proposal distribution (with Discussion). Journal of the Royal Statistical Society B, 65, 3 - 55.
Green, P. J. (1995). Reversible jump MCMC computation and Bayesian model determination. Biometrika, 82, 711 - 732.
Richardson, S., and Green, P. J. (1997). On Bayesian analysis of mixtures with unknown number of components (with Discussion). Journal of the Royal Statistical Society B, 59, 731 - 792.
Examples
## See the description of R commands for
## the models described in
## Komarek (2006),
## Komarek and Lesaffre (2007).
##
## R commands available
## in the documentation
## directory of this package as
## - ex-cgd.R and
## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-cgd.pdf
##
## - ex-tandmobMixture.R and
## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobMixture.pdf
##
Read the initial values for the Bayesian survival regression model to the list.
Description
This function creates the list of initial values as required by the init
argument of the function bayessurvreg1
. The initials are taken from
the files that are of the form of the files where the simulated values from
the McMC run performed by the function bayessurvreg1
are stored.
The files are assumed to have the following names: "iteration.sim",
"mixmoment.sim", "mweight.sim", "mmean.sim", "mvariance.sim",
"beta.sim", "b.sim", "Y.sim", "r.sim", "D.sim", "otherp.sim", "u.sim". Some of these files
may be missing. In that case, the corresponding initial is filled by NULL
.
Usage
bayessurvreg1.files2init(dir = getwd(), row, kmax)
Arguments
dir |
string giving the directory where it will be searched for the files with initial values. |
row |
the row (possible header does not count) from the files with the values that will be considered to give the initial values. By default, it is the last row from the files. |
kmax |
maximal number of mixture components. This must be given
only if |
Value
A list with components called "iter", "mixture", "beta", "b", "D", "y",
"r", "otherp", "u"
in the form as required by the argument init
of the function bayessurvreg1
.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Helping function for Bayesian survival regression models, version 1.
Description
These functions are not to be called by ordinary users.
These are just sub-parts of ‘bayessurvreg1’ function to make it more readable for the programmer.
Usage
bayessurvreg1.checkStore(store)
bayessurvreg1.priorInit(prior, init, Yinit, Xinit, n, nX, nrandom,
ncluster, indb, randomInt, toler.chol)
bayessurvreg1.priorBeta(prior.beta, nX, indb, factors,
n.factors, n.in.factors)
bayessurvreg1.priorb(prior.b, nrandom, ncluster, toler.chol)
bayessurvreg1.writeHeaders(dir, prior, store, nX, X, names.random,
ncluster, nrandom, rnamesX, unique.cluster, nBetaBlocks, nbBlocks)
bayessurvreg1.revjump(prop.revjump)
Arguments
store |
a~list as required by the |
prior , init , Yinit , Xinit , n , nX , nrandom , ncluster |
??? |
indb , randomInt , toler.chol |
??? |
prior.beta , factors , n.factors , n.in.factors |
??? |
prior.b |
??? |
prop.revjump |
??? |
dir , X , names.random |
??? |
rnamesX , unique.cluster , nBetaBlocks , nbBlocks |
??? |
Value
Some lists.
Value for bayessurvreg1.priorb
A~list with the following components:
- double
double vector for C++ constructor of class
bblocks
. It has the following parts:priordD covparLV halfRangeUnif weightUnif - integer
integer vector for C++ constructor of class
bblocks
. It has the following parts:nrandom ncluster priorD typeUpd nBlocks nInBlock lcovparLV indBlockLV
and the following attributes:
prior.b |
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Cluster-specific accelerated failure time model for multivariate, possibly doubly-interval-censored data. The error distribution is expressed as a penalized univariate normal mixture with high number of components (G-spline). The distribution of the vector of random effects is multivariate normal.
Description
A function to estimate a regression model with possibly clustered (possibly right, left, interval or doubly-interval censored) data. In the case of doubly-interval censoring, different regression models can be specified for the onset and event times.
(Multivariate) random effects, normally distributed and acting as in the linear mixed model, normally distributed, can be included to adjust for clusters.
The error density of the regression model is specified as a mixture of Bayesian G-splines (normal densities with equidistant means and constant variances). This function performs an MCMC sampling from the posterior distribution of unknown quantities.
For details, see Komárek (2006), and Komárek, Lesaffre and Legrand (2007).
We explain first in more detail a model without doubly censoring.
Let T_{i,l},\; i=1,\dots, N,\; l=1,\dots, n_i
be event times for i
th cluster and the units within that cluster
The following regression model is assumed:
\log(T_{i,l}) = \beta'x_{i,l} + b_i'z_{i,l} + \varepsilon_{i,l},\quad i=1,\dots, N,\;l=1,\dots, n_i
where \beta
is unknown regression parameter vector,
x_{i,l}
is a vector of covariates.
b_i
is a (multivariate) cluster-specific random effect
vector and z_{i,l}
is a vector of covariates for random
effects.
The random effect vectors b_i,\;i=1,\dots, N
are assumed to be i.i.d. with a (multivariate) normal distribution
with the mean \beta_b
and a covariance matrix
D
. Hierarchical centring (see Gelfand, Sahu, Carlin, 1995) is
used. I.e. \beta_b
expresses the average effect of the
covariates included in z_{i,l}
. Note that covariates
included in z_{i,l}
may not be included in the covariate
vector x_{i,l}
. The covariance matrix D
is
assigned an inverse Wishart prior distribution in the next level of hierarchy.
The error terms
\varepsilon_{i,l},\;i=1,\dots, N, l=1,\dots, n_i
are assumed to be i.i.d. with a univariate density
g_{\varepsilon}(e)
. This density is expressed as
a mixture of Bayesian G-splines (normal densities with equidistant
means and constant variances). We distinguish two,
theoretically equivalent, specifications.
- Specification 1
-
\varepsilon \sim \sum_{j=-K}^{K} w_{j} N(\mu_{j},\,\sigma^2)
where
\sigma^2
is the unknown basis variance and\mu_{j},\;j=-K,\dots, K
is an equidistant grid of knots symmetric around the unknown point\gamma
and related to the unknown basis variance through the relationship\mu_{j} = \gamma + j\delta\sigma,\quad j=-K,\dots,K,
where
\delta
is fixed constants, e.g.\delta=2/3
(which has a justification of being close to cubic B-splines). - Specification 2
-
\varepsilon \sim \alpha + \tau\,V
where
\alpha
is an unknown intercept term and\tau
is an unknown scale parameter.V
is then standardized error term which is distributed according to the univariate normal mixture, i.e.V\sim \sum_{j=-K}^{K} w_{j} N(\mu_{j},\,\sigma^2)
where
\mu_{j},\;j=-K,\dots, K
is an equidistant grid of fixed knots (means), usually symmetric about the fixed point\gamma=0
and\sigma^2
is fixed basis variance. Reasonable values for the numbers of grid pointsK
isK=15
with the distance between the two knots equal to\delta=0.3
and for the basis variance\sigma^2=0.2^2.
Personally, I found Specification 2 performing better. In the paper Komárek, Lesaffre and Legrand (2007) only Specification 2 is described.
The mixture weights
w_{j},\;j=-K,\dots, K
are
not estimated directly. To avoid the constraints
0 < w_{j} < 1
and
\sum_{j=-K}^{K}\,w_j = 1
transformed weights a_{j},\;j=-K,\dots, K
related to the original weights by the logistic transformation:
a_{j} = \frac{\exp(w_{j})}{\sum_{m}\exp(w_{m})}
are estimated instead.
A Bayesian model is set up for all unknown parameters. For more details I refer to Komárek (2006) and to Komárek, Lesafre, and Legrand (2007).
If there are doubly-censored data the model of the same type as above can be specified for both the onset time and the time-to-event.
Usage
bayessurvreg2(formula, random, formula2, random2,
data = parent.frame(),
na.action = na.fail, onlyX = FALSE,
nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10),
prior, prior.beta, prior.b, init = list(iter = 0),
mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1),
prior2, prior.beta2, prior.b2, init2,
mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1),
store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE,
r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE),
dir)
Arguments
formula |
model formula for the regression. In the case of doubly-censored data, this is the model formula for the onset time. The left-hand side of the In the formula all covariates appearing both in the vector
If
| |
random |
formula for the ‘random’ part of the model, i.e. the
part that specifies the covariates If omitted, no random part is included in the model. E.g. to specify the model with a
random intercept, say When some random effects are included the random intercept is added
by default. It can be removed using e.g. | |
formula2 |
model formula for the regression of the time-to-event in
the case of doubly-censored data. Ignored otherwise. The same structure as
for | |
random2 |
specification of the ‘random’ part of the model for
time-to-event in the case of doubly-censored data. Ignored
otherwise. The same structure as for | |
data |
optional data frame in which to interpret the variables
occuring in the | |
na.action |
the user is discouraged from changing the default
value | |
onlyX |
if | |
nsimul |
a list giving the number of iterations of the MCMC and other parameters of the simulation.
| |
prior |
a list specifying the prior distribution of the G-spline
defining the distribution of the error term in the regression model
given by The item | |
prior.b |
a list defining the way in which the random effects
involved in
| |
prior.beta |
prior specification for the regression parameters,
in the case of doubly-censored data for the regression parameters of
the onset time, i.e. it is related to This should be a list with the following components:
It is recommended to run the function
bayessurvreg2 first with its argument | |
init |
an optional list with initial values for the MCMC related
to the model given by
| |
mcmc.par |
a list specifying how some of the G-spline parameters
related to the distribution of the error term from In contrast to | |
prior2 |
a list specifying the prior distribution of the G-spline
defining the distribution of the error term in the regression model
given by | |
prior.b2 |
prior specification for the parameters related to the
random effects from | |
prior.beta2 |
prior specification for the regression parameters
of time-to-event in the case of doubly censored data (related to
| |
init2 |
an optional list with initial values for the MCMC related
to the model given by | |
mcmc.par2 |
a list specifying how some of the G-spline parameters
related to | |
store |
a list of logical values specifying which chains that are not stored by default are to be stored. The list can have the following components.
| |
dir |
a string that specifies a directory where all sampled values are to be stored. |
Value
A list of class bayessurvreg2
containing an information
concerning the initial values and prior choices.
Files created
Additionally, the following files with sampled values
are stored in a directory specified by dir
argument of this
function (some of them are created only on request, see store
parameter of this function).
Headers are written to all files created by default and to files asked
by the user via the argument store
. During the burn-in, only
every nsimul$nwrite
value is written. After the burn-in, all
sampled values are written in files created by default and to files
asked by the user via the argument store
. In the files for
which the corresponding store
component is FALSE
, every
nsimul$nwrite
value is written during the whole MCMC (this
might be useful to restart the MCMC from some specific point).
The following files are created:
- iteration.sim
one column labeled
iteration
with indeces of MCMC iterations to which the stored sampled values correspond.- mixmoment.sim
columns labeled
k
,Mean.1
,D.1.1
, wherek = number of mixture components that had probability numerically higher than zero;
Mean.1 =
\mbox{E}(\varepsilon_{i,l})
;D.1.1 =
\mbox{var}(\varepsilon_{i,l})
;all related to the distribution of the error term from the model given by
formula
.- mixmoment_2.sim
in the case of doubly-censored data, the same structure as
mixmoment.sim
, however related to the model given byformula2
.- mweight.sim
sampled mixture weights
w_{k}
of mixture components that had probabilities numerically higher than zero. Related to the model given byformula
.- mweight_2.sim
in the case of doubly-censored data, the same structure as
mweight.sim
, however related to the model given byformula2
.- mmean.sim
indeces
k,
k \in\{-K, \dots, K\}
of mixture components that had probabilities numerically higher than zero. It corresponds to the weights inmweight.sim
. Related to the model given byformula
.- mmean_2.sim
in the case of doubly-censored data, the same structure as
mmean.sim
, however related to the model given byformula2
.- gspline.sim
characteristics of the sampled G-spline (distribution of
\varepsilon_{i,l}
) related to the model given byformula
. This file together withmixmoment.sim
,mweight.sim
andmmean.sim
can be used to reconstruct the G-spline in each MCMC iteration.The file has columns labeled
gamma1
,sigma1
,delta1
,intercept1
,scale1
, The meaning of the values in these columns is the following:gamma1 = the middle knot
\gamma
If ‘Specification’ is 2, this column usually contains zeros;sigma1 = basis standard deviation
\sigma
of the G-spline. This column contains a fixed value if ‘Specification’ is 2;delta1 = distance
delta
between the two knots of the G-spline. This column contains a fixed value if ‘Specification’ is 2;intercept1 = the intercept term
\alpha
of the G-spline. If ‘Specification’ is 1, this column usually contains zeros;scale1 = the scale parameter
\tau
of the G-spline. If ‘Specification’ is 1, this column usually contains ones;- gspline_2.sim
in the case of doubly-censored data, the same structure as
gspline.sim
, however related to the model given byformula2
.- mlogweight.sim
fully created only if
store$a = TRUE
. The file contains the transformed weightsa_{k},
k=-K,\dots,K
of all mixture components, i.e. also of components that had numerically zero probabilities. This file is related to the error distribution of the model given byformula
.- mlogweight_2.sim
fully created only if
store$a2 = TRUE
and in the case of doubly-censored data, the same structure asmlogweight.sim
, however related to the error distribution of the model given byformula2
.- r.sim
fully created only if
store$r = TRUE
. The file contains the labels of the mixture components into which the residuals are intrinsically assigned. Instead of indeces on the scale\{-K,\dots, K\}
values from 1 to(2\,K+1)
are stored here. Functionvecr2matr
can be used to transform it back to indices from-K
toK
.- r_2.sim
fully created only if
store$r2 = TRUE
and in the case of doubly-censored data, the same structure asr.sim
, however related to the model given byformula2
.- lambda.sim
one column labeled
lambda
. These are the values of the smoothing parameter\lambda
(hyperparameters of the prior distribution of the transformed mixture weightsa_{k}
). This file is related to the model given byformula
.- lambda_2.sim
in the case of doubly-censored data, the same structure as
lambda.sim
, however related to the model given byformula2
.- beta.sim
sampled values of the regression parameters, both the fixed effects
\beta
and means of the random effects\beta_b
(except the random intercept which has always the mean equal to zero). This file is related to the model given byformula
. The columns are labeled according to thecolnames
of the design matrix.- beta_2.sim
in the case of doubly-censored data, the same structure as
beta.sim
, however related to the model given byformula2
.- D.sim
sampled values of the covariance matrix
D
of the random effects. The file has1 + 0.5\,q\,(q+1)
columns (q
is the dimension of the random effect vectorb_i
). The first column labeleddet
contains the determinant of the sampled matrix, additional columns labeledD.1.1
,D.2.1
, ...,D.q.1
, ...D.q.q
contain the lower triangle of the sampled matrix. This file is related to the model specified byformula
andrandom
.- D_2.sim
in the case of doubly-censored data, the same structure as
D.sim
, however related to the model given byformula2
andrandom2
.- b.sim
fully created only if
store$b = TRUE
. It contains sampled values of random effects for all clusters in the data set. The file hasq\times N
columns sorted asb_{1,1},\dots,b_{1,q},\dots, b_{N,1},\dots,b_{N,q}
. This file is related to the model given byformula
andrandom
.- b_2.sim
fully created only if
store$b2 = TRUE
and in the case of doubly-censored data, the same structure asb.sim
, however related to the model given byformula2
andrandom2
.- Y.sim
fully created only if
store$y = TRUE
. It contains sampled (augmented) log-event times for all observations in the data set.- Y_2.sim
fully created only if
store$y2 = TRUE
and in the case of doubly-censored data, the same structure asY.sim
, however related to the model given byformula2
.- logposter.sim
columns labeled
loglik
,penalty
, andlogprw
. This file is related to the model given byformula
. The columns have the following meaning.loglik
=
% - (\sum_{i=1}^N\,n_i)\,\Bigl\{\log(\sqrt{2\pi}) + \log(\sigma) \Bigr\}- 0.5\sum_{i=1}^N\sum_{l=1}^{n_i} \Bigl\{ (\sigma^2\,\tau^2)^{-1}\; (y_{i,l} - x_{i,l}'\beta - z_{i,l}'b_i - \alpha - \tau\mu_{r_{i,l}})^2 \Bigr\}
where
y_{i,l}
denotes (augmented) (i,l)th true log-event time.In other words,
loglik
is equal to the conditional log-density\sum_{i=1}^N \sum_{l=1}^{n_i}\,\log\Bigl\{p\bigl(y_{i,l}\;\big|\;r_{i,l},\,\beta,\,b_i,\,\mbox{G-spline}\bigr)\Bigr\};
penalty: the penalty term
-\frac{1}{2}\sum_{k}\Bigl(\Delta\, a_k\Bigr)^2
(not multiplied by
\lambda
);logprw
=
-2\,(\sum_i n_i)\,\log\bigl\{\sum_{k}a_{k}\bigr\} + \sum_{k}N_{k}\,a_{k},
whereN_{k}
is the number of residuals assigned intrinsincally to thek
th mixture component.In other words,
logprw
is equal to the conditional log-density\sum_{i=1}^N\sum_{l=1}^{n_i} \log\bigl\{p(r_{i,l}\;|\;\mbox{G-spline weights})\bigr\}.
- logposter_2.sim
in the case of doubly-censored data, the same structure as
logposter.sim
, however related to the model given byformula2
.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Gelfand, A. E., Sahu, S. K., and Carlin, B. P. (1995). Efficient parametrisations for normal linear mixed models. Biometrika, 82, 479-488.
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A., Lesaffre, E., and Legrand, C. (2007). Baseline and treatment effect heterogeneity for survival times between centers using a random effects accelerated failure time model with flexible error distribution. Statistics in Medicine, 26, 5457-5472.
Examples
## See the description of R commands for
## the model with EORTC data,
## analysis described in Komarek, Lesaffre and Legrand (2007).
##
## R commands available in the documentation
## directory of this package
## as ex-eortc.R and
## https://www2.karlin.mff.cuni.cz/ komarek/software/bayesSurv/ex-eortc.pdf
##
Helping functions for Bayesian regression with an error distribution smoothed using G-splines
Description
These functions are not to be called by ordinary users.
These are just sub-parts of ‘bayessurvreg2’ function to make it more readable for the programmer.
Usage
bayessurvreg2.checkStore(store)
bayessurvreg2.priorInit(prior, init, design, mcmc.par,
prior2, init2, design2, mcmc.par2, doubly)
bayessurvreg2.priorBeta(prior.beta, init, design)
bayessurvreg2.priorb(prior.b, init, design)
bayessurvreg2.writeHeaders(dir, doubly, prior.init, store, design, design2)
Arguments
store |
a~list as required by the argument |
prior |
a~list as required by the argument |
prior2 |
a~list as required by the argument |
init |
a~list as required by the argument |
init2 |
a~list as required by the argument |
mcmc.par |
a~list as required by the argument |
mcmc.par2 |
a~list as required by the argument |
design |
an~object as returned by the function
|
design2 |
an~object as returned by the function
|
doubly |
logical indicating whether the response is doubly censored or not |
prior.beta |
a~list as required by the argument |
prior.b |
a~list as required by the argument |
dir |
path to the directory where the sampled values are to be stored |
prior.init |
a~list as returned by the function
|
Value
Some lists.
Value for bayessurvreg2.priorBeta
A list with the following components:
- parmI
integer arguments for C++
BetaGamma
constructor- parmD
double arguments for C++
BetaGamma
constructor
and the following attributes:
init |
prior.beta |
Value for bayessurvreg2.priorInit
The same object as that returned by bayesBisurvreg.priorInit
.
Value for bayessurvreg2.priorb
A list with the following components:
- bparmI
integer arguments for C++
RandomEff
constructor- bparmD
double arguments for C++
RandomEff
constructor- DparmI
integer arguments for C++
CovMatrix
constructor- DparmD
double arguments for C++
CovMatrix
constructor
and the following attributes:
init.b |
init.D |
prior.b |
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Cluster-specific accelerated failure time model for multivariate, possibly doubly-interval-censored data with flexibly specified random effects and/or error distribution.
Description
A function to estimate a regression model with possibly clustered (possibly right, left, interval or doubly-interval censored) data. In the case of doubly-interval censoring, different regression models can be specified for the onset and event times.
A univariate random effect (random intercept) with the distribution expressed as a penalized normal mixture can be included in the model to adjust for clusters.
The error density of the regression model is specified as a mixture of Bayesian G-splines (normal densities with equidistant means and constant variances). This function performs an MCMC sampling from the posterior distribution of unknown quantities.
For details, see Komárek (2006) and Komárek and Lesaffre (2008).
SUPPLEMENTED IN 06/2013: Interval-censored times might be subject to misclassification. In case of doubly-interval-censored data, the event time might be subject to misclassification. For details, see García-Zattera, Jara and Komárek (2016).
We explain first in more detail a model without doubly censoring.
Let T_{i,l},\; i=1,\dots, N,\; l=1,\dots, n_i
be event times for i
th cluster and the units within that cluster
The following regression model is assumed:
\log(T_{i,l}) = \beta'x_{i,l} + b_i + \varepsilon_{i,l},\quad i=1,\dots, N,\;l=1,\dots, n_i
where \beta
is unknown regression parameter vector,
x_{i,l}
is a vector of covariates.
b_i
is a cluster-specific random effect (random intercept).
The random effects b_i,\;i=1,\dots, N
are assumed to be i.i.d. with a univariate density g_{b}(b)
.
The error terms
\varepsilon_{i,l},\;i=1,\dots, N, l=1,\dots, n_i
are assumed to be i.i.d. with a univariate density
g_{\varepsilon}(e)
.
Densities g_{b}
and g_{\varepsilon}
are
both expressed as
a mixture of Bayesian G-splines (normal densities with equidistant
means and constant variances). We distinguish two,
theoretically equivalent, specifications.
In the following, the density for \varepsilon
is explicitely described. The density for b
is obtained in
an analogous manner.
- Specification 1
-
\varepsilon \sim \sum_{j=-K}^{K} w_{j} N(\mu_{j},\,\sigma^2)
where
\sigma^2
is the unknown basis variance and\mu_{j},\;j=-K,\dots, K
is an equidistant grid of knots symmetric around the unknown point\gamma
and related to the unknown basis variance through the relationship\mu_{j} = \gamma + j\delta\sigma,\quad j=-K,\dots,K,
where
\delta
is fixed constants, e.g.\delta=2/3
(which has a justification of being close to cubic B-splines). - Specification 2
-
\varepsilon \sim \alpha + \tau\,V
where
\alpha
is an unknown intercept term and\tau
is an unknown scale parameter.V
is then standardized error term which is distributed according to the univariate normal mixture, i.e.V\sim \sum_{j=-K}^{K} w_{j} N(\mu_{j},\,\sigma^2)
where
\mu_{j},\;j=-K,\dots, K
is an equidistant grid of fixed knots (means), usually symmetric about the fixed point\gamma=0
and\sigma^2
is fixed basis variance. Reasonable values for the numbers of grid pointsK
isK=15
with the distance between the two knots equal to\delta=0.3
and for the basis variance\sigma^2=0.2^2.
Personally, I found Specification 2 performing better. In the paper Komárek and Lesaffre (2008) only Specification 2 is described.
The mixture weights
w_{j},\;j=-K,\dots, K
are
not estimated directly. To avoid the constraints
0 < w_{j} < 1
and
\sum_{j=-K}^{K}\,w_j = 1
transformed weights a_{j},\;j=-K,\dots, K
related to the original weights by the logistic transformation:
a_{j} = \frac{\exp(w_{j})}{\sum_{m}\exp(w_{m})}
are estimated instead.
A Bayesian model is set up for all unknown parameters. For more details I refer to Komárek and Lesaffre (2008).
If there are doubly-censored data the model of the same type as above can be specified for both the onset time and the time-to-event.
In the case one wishes to link the random intercept of the onset model and the random intercept of the time-to-event model, there are the following possibilities.
Bivariate normal distribution
It is assumed that the pair of random intercepts from the onset and
time-to-event part of the model are normally distributed with zero
mean and an unknown covariance matrix D
.
A priori, the inverse covariance matrix D^{-1}
is
addumed to follow a Wishart distribution.
Unknown correlation between the basis G-splines
Each pair of basis G-splines describing the distribution of the random
intercept in the onset part and the time-to-event part of the model is
assumed to be correlated with an unknown correlation coefficient
\varrho
. Note that this is just an experiment and is no
more further supported.
Prior distribution on \varrho
is assumed to be
uniform. In the MCMC, the Fisher Z transform of the \varrho
given by
Z =
-\frac{1}{2}\log\Bigl(\frac{1-\varrho}{1+\varrho}\Bigr)=\mbox{atanh}(\varrho)
is sampled. Its prior is derived from the uniform prior
\mbox{Unif}(-1,\;1)
put on \varrho.
The Fisher Z transform is updated using the Metropolis-Hastings alhorithm. The proposal distribution is given either by a normal approximation obtained using the Taylor expansion of the full conditional distribution or by a Langevin proposal (see Robert and Casella, 2004, p. 318).
Usage
bayessurvreg3(formula, random, formula2, random2,
data = parent.frame(),
classification,
classParam = list(Model = c("Examiner", "Factor:Examiner"),
a.sens = 1, b.sens = 1, a.spec = 1, b.spec = 1,
init.sens = NULL, init.spec = NULL),
na.action = na.fail, onlyX = FALSE,
nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10),
prior, prior.beta, prior.b, init = list(iter = 0),
mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1,
type.update.a.b = "slice", k.overrelax.a.b = 1,
k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
prior2, prior.beta2, prior.b2, init2,
mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1,
type.update.a.b = "slice", k.overrelax.a.b = 1,
k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
priorinit.Nb,
rho = list(type.update = "fixed.zero", init=0, sigmaL=0.1),
store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE,
r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE,
a.b = FALSE, a.b2 = FALSE, r.b = FALSE, r.b2 = FALSE),
dir)
bayessurvreg3Para(formula, random, formula2, random2,
data = parent.frame(),
classification,
classParam = list(Model = c("Examiner", "Factor:Examiner"),
a.sens = 1, b.sens = 1, a.spec = 1, b.spec = 1,
init.sens = NULL, init.spec = NULL),
na.action = na.fail, onlyX = FALSE,
nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10),
prior, prior.beta, prior.b, init = list(iter = 0),
mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1,
type.update.a.b = "slice", k.overrelax.a.b = 1,
k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
prior2, prior.beta2, prior.b2, init2,
mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1,
type.update.a.b = "slice", k.overrelax.a.b = 1,
k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
priorinit.Nb,
rho = list(type.update = "fixed.zero", init=0, sigmaL=0.1),
store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE,
r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE,
a.b = FALSE, a.b2 = FALSE, r.b = FALSE, r.b2 = FALSE),
dir)
Arguments
formula |
model formula for the regression. In the case of doubly-censored data, this is the model formula for the onset time. The left-hand side of the Intercept is implicitely included in the model by the
estimation of the error distribution. As a consequence If
| ||||
random |
formula for the ‘random’ part of the model.
In the case of doubly-censored data, this is the
is allowed. If omitted, no random part is included in the model. | ||||
formula2 |
model formula for the regression of the time-to-event in
the case of doubly-censored data. Ignored otherwise. The same structure as
for | ||||
random2 |
specification of the ‘random’ part of the model for
time-to-event in the case of doubly-censored data. Ignored
otherwise. The same structure as for | ||||
data |
optional data frame in which to interpret the variables
occuring in the | ||||
classification |
Possible additional columns are ignored. If missing, no misclassification is considered. | ||||
classParam |
a The following components of the
| ||||
na.action |
the user is discouraged from changing the default
value | ||||
onlyX |
if | ||||
nsimul |
a list giving the number of iterations of the MCMC and other parameters of the simulation.
| ||||
prior |
a list specifying the prior distribution of the G-spline
defining the distribution of the error term in the regression model
given by The item | ||||
prior.b |
a list specifying the prior distribution of the
G-spline defining the distribution of the random intercept in the
regression model given by It is ignored if the argument The item | ||||
prior.beta |
prior specification for the regression parameters,
in the case of doubly-censored data for the regression parameters of
the onset time, i.e. it is related to This should be a list with the following components:
It is recommended to run the function
bayessurvreg3 first with its argument | ||||
init |
an optional list with initial values for the MCMC related
to the model given by
| ||||
mcmc.par |
a list specifying how some of the G-spline parameters
related to the distribution of the error term and of the random
intercept from Compared to the mcmc.par argument of the function
In contrast to | ||||
prior2 |
a list specifying the prior distribution of the G-spline
defining the distribution of the error term in the regression model
given by | ||||
prior.b2 |
prior specification for the parameters related to the
random effects from It is ignored if the argument | ||||
prior.beta2 |
prior specification for the regression parameters
of time-to-event in the case of doubly censored data (related to
| ||||
init2 |
an optional list with initial values for the MCMC related
to the model given by | ||||
mcmc.par2 |
a list specifying how some of the G-spline parameters
related to | ||||
priorinit.Nb |
a list specifying the prior of the random intercepts in the case of the AFT model with doubly-interval-censored data and onset, time-to-event random intercepts following bivariate normal distribution. The list should have the following components.
| ||||
rho |
a list specifying possible correlation between the onset
random intercept and the time-to-event random intercept in the
experimental version of the model. If not given correlation is fixed
to It is ignored if the argument The list can have the following components.
| ||||
store |
a list of logical values specifying which chains that are not stored by default are to be stored. The list can have the following components.
| ||||
dir |
a string that specifies a directory where all sampled values are to be stored. |
Value
A list of class bayessurvreg3
containing an information
concerning the initial values and prior choices.
Files created
Additionally, the following files with sampled values
are stored in a directory specified by dir
argument of this
function (some of them are created only on request, see store
parameter of this function).
Headers are written to all files created by default and to files asked
by the user via the argument store
. During the burn-in, only
every nsimul$nwrite
value is written. After the burn-in, all
sampled values are written in files created by default and to files
asked by the user via the argument store
. In the files for
which the corresponding store
component is FALSE
, every
nsimul$nwrite
value is written during the whole MCMC (this
might be useful to restart the MCMC from some specific point).
The following files are created:
- iteration.sim
one column labeled
iteration
with indeces of MCMC iterations to which the stored sampled values correspond.- mixmoment.sim
this file is related to the density of the error term from the model given by
formula
.Columns labeled
k
,Mean.1
,D.1.1
, wherek = number of mixture components that had probability numerically higher than zero;
Mean.1 =
\mbox{E}(\varepsilon_{i,l})
;D.1.1 =
\mbox{var}(\varepsilon_{i,l})
.- mixmoment_b.sim
this file is related to the density of the random intercept from the model given by
formula
andrandom
.The same structure as
mixmoment.sim
.- mixmoment_2.sim
in the case of doubly-censored data. This file is related to the density of the error term from the model given by
formula2
.The same structure as
mixmoment.sim
.- mixmoment_b2.sim
in the case of doubly-censored data. This file is related to the density of the random intercept from the model given by
formula2
andrandom2
.The same structure as
mixmoment.sim
.- mweight.sim
this file is related to the density of the error term from the model given by
formula
.Sampled mixture weights
w_{k}
of mixture components that had probabilities numerically higher than zero.- mweight_b.sim
this file is related to the density of the random intercept from the model given by
formula
andrandom
.The same structure as
mweight.sim
.- mweight_2.sim
in the case of doubly-censored data. This file is related to the density of the error term from the model given by
formula2
.The same structure as
mweight.sim
.- mweight_b2.sim
in the case of doubly-censored data. This file is related to the density of the random intercept from the model given by
formula2
andrandom2
.The same structure as
mweight.sim
.- mmean.sim
this file is related to the density of the error term from the model given by
formula
.Indeces
k,
k \in\{-K, \dots, K\}
of mixture components that had probabilities numerically higher than zero. It corresponds to the weights inmweight.sim
.- mmean_b.sim
this file is related to the density of the random intercept from the model given by
formula
andrandom
.The same structure as
mmean.sim
.- mmean_2.sim
in the case of doubly-censored data. This file is related to the density of the error term from the model given by
formula2
.The same structure as
mmean.sim
.- mmean_b2.sim
in the case of doubly-censored data. This file is related to the density of the random intercept from the model given by
formula2
andrandom2
.The same structure as
mmean.sim
.- gspline.sim
this file is related to the density of the error term from the model given by
formula
.Characteristics of the sampled G-spline. This file together with
mixmoment.sim
,mweight.sim
andmmean.sim
can be used to reconstruct the G-spline in each MCMC iteration.The file has columns labeled
gamma1
,sigma1
,delta1
,intercept1
,scale1
, The meaning of the values in these columns is the following:gamma1 = the middle knot
\gamma
If ‘Specification’ is 2, this column usually contains zeros;sigma1 = basis standard deviation
\sigma
of the G-spline. This column contains a fixed value if ‘Specification’ is 2;delta1 = distance
delta
between the two knots of the G-spline. This column contains a fixed value if ‘Specification’ is 2;intercept1 = the intercept term
\alpha
of the G-spline. If ‘Specification’ is 1, this column usually contains zeros;scale1 = the scale parameter
\tau
of the G-spline. If ‘Specification’ is 1, this column usually contains ones;- gspline_b.sim
this file is related to the density of the random intercept from the model given by
formula
andrandom
.The same structure as
gspline.sim
.- gspline_2.sim
in the case of doubly-censored data. This file is related to the density of the error term from the model given by
formula2
.The same structure as
gspline.sim
.- gspline_b2.sim
in the case of doubly-censored data. This file is related to the density of the random intercept from the model given by
formula2
andrandom2
.The same structure as
gspline.sim
.- mlogweight.sim
this file is related to the density of the error term from the model given by
formula
.Fully created only if
store$a = TRUE
. The file contains the transformed weightsa_{k},
k=-K,\dots,K
of all mixture components, i.e. also of components that had numerically zero probabilities.- mlogweight_b.sim
this file is related to the density of the random intercept from the model given by
formula
andrandom
.Fully created only if
store$a.b = TRUE
.The same structure as
mlogweight.sim
.- mlogweight_2.sim
in the case of doubly-censored data. This file is related to the density of the error term from the model given by
formula2
.Fully created only if
store$a2 = TRUE
.The same structure as
mlogweight.sim
.- mlogweight_b2.sim
in the case of doubly-censored data. This file is related to the density of the random intercept from the model given by
formula2
andrandom2
.Fully created only if
store$a.b2 = TRUE
.The same structure as
mlogweight.sim
.- r.sim
this file is related to the density of the error term from the model given by
formula
.Fully created only if
store$r = TRUE
. The file contains the labels of the mixture components into which the residuals are intrinsically assigned. Instead of indeces on the scale\{-K,\dots, K\}
values from 1 to(2\,K+1)
are stored here. Functionvecr2matr
can be used to transform it back to indices from-K
toK
.- r_b.sim
this file is related to the density of the random intercept from the model given by
formula
andrandom
.Fully created only if
store$r.b = TRUE
.The same structure as
r.sim
.- r_2.sim
in the case of doubly-censored data. This file is related to the density of the error term from the model given by
formula2
.Fully created only if
store$r2 = TRUE
.The same structure as
r.sim
.- r_b2.sim
in the case of doubly-censored data. This file is related to the density of the random intercept from the model given by
formula2
andrandom2
.Fully created only if
store$r.b2 = TRUE
.The same structure as
r.sim
.- lambda.sim
this file is related to the density of the error term from the model given by
formula
.One column labeled
lambda
. These are the values of the smoothing parameter\lambda
(hyperparameters of the prior distribution of the transformed mixture weightsa_{k}
).- lambda_b.sim
this file is related to the density of the random intercept from the model given by
formula
andrandom
.The same structure as
lambda.sim
.- lambda_2.sim
in the case of doubly-censored data. This file is related to the density of the error term from the model given by
formula2
.The same structure as
lambda.sim
.- lambda_b2.sim
in the case of doubly-censored data. This file is related to the density of the random intercept from the model given by
formula2
andrandom2
.The same structure as
lambda.sim
.- beta.sim
this file is related to the model given by
formula
.Sampled values of the regression parameters
\beta
.The columns are labeled according to the
colnames
of the design matrix.- beta_2.sim
in the case of doubly-censored data, the same structure as
beta.sim
, however related to the model given byformula2
.- b.sim
this file is related to the model given by
formula
andrandom
.Fully created only if
store$b = TRUE
. It contains sampled values of random intercepts for all clusters in the data set. The file hasN
columns.- b_2.sim
fully created only if
store$b2 = TRUE
and in the case of doubly-censored data, the same structure asb.sim
, however related to the model given byformula2
andrandom2
.- Y.sim
this file is related to the model given by
formula
.Fully created only if
store$y = TRUE
. It contains sampled (augmented) log-event times for all observations in the data set.- Y_2.sim
fully created only if
store$y2 = TRUE
and in the case of doubly-censored data, the same structure asY.sim
, however related to the model given byformula2
.- logposter.sim
-
This file is related to the residuals of the model given by
formula
.Columns labeled
loglik
,penalty
, andlogprw
. The columns have the following meaning.loglik
=
% - (\sum_{i=1}^N\,n_i)\,\Bigl\{\log(\sqrt{2\pi}) + \log(\sigma) \Bigr\}- 0.5\sum_{i=1}^N\sum_{l=1}^{n_i} \Bigl\{ (\sigma^2\,\tau^2)^{-1}\; (y_{i,l} - x_{i,l}'\beta - b_i - \alpha - \tau\mu_{r_{i,l}})^2 \Bigr\}
where
y_{i,l}
denotes (augmented) (i,l)th true log-event time.In other words,
loglik
is equal to the conditional log-density\sum_{i=1}^N \sum_{l=1}^{n_i}\,\log\Bigl\{p\bigl(y_{i,l}\;\big|\;r_{i,l},\,\beta,\,b_i,\,\mbox{error-G-spline}\bigr)\Bigr\};
penalty: the penalty term
-\frac{1}{2}\sum_{k}\Bigl(\Delta\, a_k\Bigr)^2
(not multiplied by
\lambda
);logprw
=
-2\,(\sum_i n_i)\,\log\bigl\{\sum_{k}a_{k}\bigr\} + \sum_{k}N_{k}\,a_{k},
whereN_{k}
is the number of residuals assigned intrinsincally to thek
th mixture component.In other words,
logprw
is equal to the conditional log-density\sum_{i=1}^N\sum_{l=1}^{n_i} \log\bigl\{p(r_{i,l}\;|\;\mbox{error-G-spline weights})\bigr\}.
- logposter_b.sim
This file is related to the random intercepts from the model given by
formula
andrandom
.Columns labeled
loglik
,penalty
, andlogprw
. The columns have the following meaning.loglik
=
% - N\,\Bigl\{\log(\sqrt{2\pi}) + \log(\sigma) \Bigr\}- 0.5\sum_{i=1}^N \Bigl\{ (\sigma^2\,\tau^2)^{-1}\; (b_i - \alpha - \tau\mu_{r_{i}})^2 \Bigr\}
where
b_{i}
denotes (augmented) ith random intercept.In other words,
loglik
is equal to the conditional log-density\sum_{i=1}^N \,\log\Bigl\{p\bigl(b_{i}\;\big|\;r_{i},\,\mbox{b-G-spline}\bigr)\Bigr\};
The columns
penalty
andlogprw
have the analogous meaning as in the case of logposter.sim file.- logposter_2.sim
in the case of doubly-censored data, the same structure as
logposter.sim
, however related to the model given byformula2
.- logposter_b2.sim
in the case of doubly-censored data, the same structure as
logposter_b.sim
, however related to the model given byformula2
andrandom2
.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
García-Zattera, M. J., Jara, A., and Komárek, A. (2016). A flexible AFT model for misclassified clustered interval-censored data. Biometrics, 72, 473 - 483.
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2008). Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association, 103, 523 - 533.
Robert C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, Second Edition. New York: Springer Science+Business Media.
Examples
## See the description of R commands for
## the cluster specific AFT model
## with the Signal Tandmobiel data,
## analysis described in Komarek and Lesaffre (2007).
##
## R commands available in the documentation
## directory of this package
## - see ex-tandmobCS.R and
## https://www2.karlin.mff.cuni.cz/ komarek/software/bayesSurv/ex-tandmobCS.pdf
##
Helping functions for Bayesian regression with an error distribution smoothed using G-splines
Description
These functions are not to be called by ordinary users.
These are just sub-parts of ‘bayessurvreg3’ function to make it more readable for the programmer.
Usage
bayessurvreg3.checkStore(store)
bayessurvreg3.priorInit(prior, init, design, mcmc.par,
prior2, init2, design2, mcmc.par2, doubly)
bayessurvreg3.priorBeta(prior.beta, init, design)
bayessurvreg3.priorb(prior.b, init, design, mcmc.par)
bayessurvreg3.writeHeaders(dir, doubly, prior.init,
priorb.di, priorb2.di, store, design, design2,
version, mclass)
bayessurvreg3.priorinitNb(priorinit.Nb, init, init2,
design, design2, doubly)
bayessurvreg3.checkrho(rho, doubly)
Arguments
store |
a list as required by the argument |
prior |
a list as required by the argument |
prior2 |
a list as required by the argument |
init |
a list as required by the argument |
init2 |
a list as required by the argument |
mcmc.par |
a list as required by the argument |
mcmc.par2 |
a list as required by the argument |
design |
an object as returned by the function
|
design2 |
an object as returned by the function
|
doubly |
logical indicating whether the response is doubly censored or not |
prior.beta |
a list as required by the argument |
prior.b |
a list as required by the argument |
dir |
path to the directory where the sampled values are to be stored |
prior.init |
a list as returned by the function
|
priorb.di |
a list as returned by the function
|
priorb2.di |
a list as returned by the function
|
priorinit.Nb |
a list as required by the argument
|
rho |
a list as required by the argument |
version |
it is equal to 3 if either there is no correlation coefficient between the onset and time-to-event random intercepts or this correlation coefficient is fixed to 0 It is equal to 31 if we are estimating correlation coefficient between the onset and time-to-event random intercepts. |
mclass |
object created by |
Value
Some lists (in most cases).
Value for bayessurvreg3.priorb
A list with the following components:
- bparmI
integer arguments for C++
RandomEff
constructor- bparmD
double arguments for C++
RandomEff
constructor- GsplI
integer arguments for C++
Gspline
constructor related to the smothed density of the random intercept- GsplD
double arguments for C++
Gspline
constructor related to the smoothed density of the random intercept- specification
1 or 2, one of the G-spline specifications related to the distribution of the random intercept
- r
initial component labels (vector of size
ncluster
) taking values from 1 to the total length of the G-spline related to the random intercept
and the following attributes:
prior.b |
init |
mcmc.par |
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Chronic Granulomatous Disease data
Description
Dataset from Fleming and Harrington (1991).
Data from a multicenter placebo-controlled randomized trial of gamma
inferon in patients with chronic granulomatous disease (CGD). 128
patients were randomized to either gamma inferon (n=63
) or
placebo (n=65
). For each patient the times from study
entry to initial and any recurrent serious infections are
available. There is a minimum of one and a maximum of eight
(recurrent) infection times per patient, with a total of 203 records.
Usage
data(cgd)
Format
a~data frame with 203 rows and 17 columns. There are the following variables contained in the data frame.
- hospit
code of the hospital
- ID
case identification number
- RDT
date randomization onto study (mmddyy)
- IDT
date of onset of serious infection, or date the patient was taken off the study (mmddyy)
- trtmt
treatment code, 1 = rIFN-gamma, 2 = placebo
- inherit
pattern of inheritance, 1 = X-linked, 2 = autosomal recessive
- age
age in years
- height
height of the patient in cm
- weight
weight of the patient in kg
- cortico
using corticosteroids at time of study entry, 1 = yes, 2 = no
- prophy
using prophylatic antibiotics at time of study entry, 1 = yes, 2 = no
- gender
1 = male, 2 = female
- hcat
hospital category, 1 = US-NIH, 2 = US-other, 2 = Europe - Amsterdam, 4 = Europe - other
- T1
elapsed time (in days) from randomization (from sequence = 1 record) to diagnosis of a serious infection or, if a censored observation, elapsed time from randomization to censoring date; computed as IDT - RDT (from sequence = 1 record)
- T2
0, for sequence = 1 record, if sequence > 1, T2 = T1(from previous record) + 1
- event
censoring indicator, 1 = non-censored observation, 2 = censored observation
- sequence
sequence number, for each patient, the infection recods are in sequence number order
Source
Appendix D.2 of Fleming and Harrington (1991).
References
Fleming, T. R. and Harrington, D. P. (1991). Counting Processes and Survival Analysis. New York: John Wiley and Sons.
Compute a simultaneous credible region (rectangle) from a sample for a vector valued parameter.
Description
See references below for more details.
Usage
credible.region(sample, probs=c(0.90, 0.975))
Arguments
sample |
a data frame or matrix with sampled values (one column = one parameter) |
probs |
probabilities for which the credible regions are to be computed |
Value
A list (one component for each confidence region) of length equal to
length(probs)
. Each component of the list is a matrix with two
rows (lower and upper limit) and as many columns as the number of
parameters giving the confidence region.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Besag, J., Green, P., Higdon, D. and Mengersen, K. (1995). Bayesian computation and stochastic systems (with Discussion). Statistical Science, 10, 3 - 66, page 30
Held, L. (2004). Simultaneous inference in risk assessment; a Bayesian perspective In: COMPSTAT 2004, Proceedings in Computational Statistics (J. Antoch, Ed.), 213 - 222, page 214
Held, L. (2004b). Simultaneous posterior probability statements from Monte Carlo output. Journal of Computational and Graphical Statistics, 13, 20 - 35.
Examples
m <- 10000
sample <- data.frame(x1=rnorm(m), x2=rnorm(m), x3=rnorm(m))
probs <- c(0.70, 0.90, 0.95)
CR <- credible.region(sample, probs=probs)
for (kk in 1:length(CR)){
suma <- sum(sample$x1 >= CR[[kk]]["Lower", "x1"] & sample$x1 <= CR[[kk]]["Upper", "x1"] &
sample$x2 >= CR[[kk]]["Lower", "x2"] & sample$x2 <= CR[[kk]]["Upper", "x2"] &
sample$x3 >= CR[[kk]]["Lower", "x3"] & sample$x3 <= CR[[kk]]["Upper", "x3"])
show <- c(suma/m, probs[kk])
names(show) <- c("Empirical", "Desired")
print(show)
}
Probability density function estimate from MCMC output
Description
Displays a plot of the density estimate for each variable in x, calculated by the density function.
This is slightly modified version of densplot
function
of a coda
package to conform to my personal preferences.
Usage
densplot2(x, plot = TRUE, show.obs = FALSE, bwf, bty = "n", main = "",
xlim, ylim, xlab, ylab, ...)
Arguments
x |
|
plot |
if |
show.obs |
show observations along the x-axis? |
bwf |
function for calculating the bandwidth. If omitted, the bandwidth is calculate by 1.06 times the minimum of the standard deviation and the interquartile range divided by 1.34 times the sample size to the negative one fifth power. |
xlim , ylim , xlab , ylab |
further arguments passed to the
|
bty , main , ... |
further arguments passed to the
|
Value
No return value.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Write headers to or clean files with sampled G-spline
Description
These functions are not to be called by ordinary users.
These are just sub-parts of some *.writeHeaders
functions to make them more
readable for the programmer.
Usage
clean.Gspline(dir, label, care.of.y=TRUE)
write.headers.Gspline(dir, dim, nP, label, gparmi,
store.a, store.y, store.r, care.of.y=TRUE)
Arguments
dir |
a~string giving the path to the directory where to search or store G-spline files |
label |
a~string with extension of the G-spline files |
dim |
dimension of the G-spline |
nP |
number of (bivariate) observations |
gparmi |
a~vector with integer arguments required by the
constructor of C++ class |
store.a |
logical, store transformed weights? |
store.y |
logical, store augmented observations? |
store.r |
logical, store allocations? |
care.of.y |
logical, do we wish to take care of the |
Value
No return value.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Read the sampled values from the Bayesian survival regression model to a coda mcmc object.
Description
This function creates a coda
mcmc
object from values found
in files where sampled values from bayessurvreg1
function are stored
or from data.frames.
Usage
files2coda(files, data.frames, variant = 1, dir = getwd(),
start = 1, end, thin = 1, header = TRUE, chain)
Arguments
files |
a vector of strings giving the names of files that are to
be converted to |
data.frames |
a vector of strings giving the names of data.frames
that are to be converted to |
variant |
a variant of Currently only 1 is supported to cooperate with |
dir |
string giving the directory where it will be searched for the files with sampled values. |
start |
the first row (possible header does not count) from the files with the sampled values that will be converted to coda objects. |
end |
the last row from the files with the sampled values that will be converted to coda objects. If missing, it is the last row in files. |
thin |
additional thinning of sampled values (i.e. only every
|
header |
TRUE or FALSE indicating whether the files with the sampled values contain also the header on the first line or not. |
chain |
parameter giving the number of the chain if parallel
chains were created and sampled values stored in data.frames further
stored in lists(). If |
Value
A list with mcmc
objects. One object per file or data.frame.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Examples
## *** illustration of usage of parameters 'data.frames' and 'chain' ***
## *********************************************************************
## Two parallel chains with four variables, stored in data.frames
## data.frames are further stored in lists
library("coda")
group1 <- list(); group2 <- list(); group3 <- list()
## first chain of first two variables:
group1[[1]] <- data.frame(var1 = rnorm(100, 0, 1), var2 = rnorm(100, 5, 4))
## second chain of first two variables:
group1[[2]] <- data.frame(var1 = rnorm(100, 0, 1), var2 = rnorm(100, 5, 4))
## first chain of the third variable:
group2[[1]] <- data.frame(var3 = rgamma(100, 1, 1))
## second chain of the third variable:
group2[[2]] <- data.frame(var3 = rgamma(100, 1, 1))
## first chain of the fourth variable:
group3[[1]] <- data.frame(var4 = rbinom(100, 1, 0.4))
## second chain of the fourth variable:
group3[[2]] <- data.frame(var4 = rbinom(100, 1, 0.4))
## Create mcmc objects for each chain separately
mc.chain1 <- files2coda(data.frames = c("group1", "group2", "group3"), chain = 1)
mc.chain2 <- files2coda(data.frames = c("group1", "group2", "group3"), chain = 2)
## Create mcmc.list to represent two parallel chains
mc <- mcmc.list(mc.chain1, mc.chain2)
rm(mc.chain1, mc.chain2)
## *** illustration of usage of parameter 'data.frames' without 'chain' ***
## ************************************************************************
## Only one chain for four variables was sampled and stored in three data.frames
## chain of first two variables:
group1 <- data.frame(var1 = rnorm(100, 0, 1), var2 = rnorm(100, 5, 4))
## chain of the third variable:
group2 <- data.frame(var3 = rgamma(100, 1, 1))
## chain of the fourth variable:
group3 <- data.frame(var4 = rbinom(100, 1, 0.4))
## Create an mcmc object
mc <- files2coda(data.frames = c("group1", "group2", "group3"))
Check and possibly fill in initial values for the G-spline, augmented observations and allocations for Bayesian models with G-splines
Description
These functions are not to be called by ordinary users.
These are just sub-parts of bayesBisurvreg.priorInit
and
related functions to make them more readable for the programmer.
Usage
give.init.Gspline(prior, init, mcmc.par, dim)
give.init.y(init.y, dim, y.left, y.right, status)
give.init.y2(init.y, init2.y, dim, design, design2, doubly)
give.init.r(init.r, init.y, dim, KK,
gamma, sigma, c4delta, intcpt, scale)
Arguments
prior |
a~list as required by |
init |
a~list as required by |
mcmc.par |
a~list as required by |
dim |
dimension of the G-spline/response, 1 or 2. |
init.y |
initial (augmented) observations possibly given by the user.
They are partially checked for consistency and these supplied by the user
used in the resulting object. This should be either vector of length |
init2.y |
initial (augmented) times-to-event (if doubly censoring) possibly given by the user.
They are partially checked for consistency and these supplied by the user
used in the resulting object. This should be either vector of length |
design |
an~object as returned by the function
|
design2 |
an~object as returned by the function
|
doubly |
logical indicating whether the response is doubly censored or not |
y.left |
observed, left or right censored log(event time) or the lower limit
of the interval censored observation. Sorted in a~transposed order compared
to |
y.right |
upper limit of the interval censored observation, whatever if the observation
is not interval-censored sorted in a~transposed order compared to |
status |
status indicator vector/matrix. 1 for observed times, 0 for right censored times, 2 for left censored times, 3 for interval censored times. |
init.r |
initial allocations possibly given by the user. This should be a~vector of length
|
KK |
vector of length |
gamma |
vector of length |
sigma |
vector of length |
c4delta |
vector of length |
intcpt |
vector of length |
scale |
vector of length |
Value
Some lists.
Value for give.init.Gspline
A~list with the following components:
- Gparmi
integer parameters for the G-spline constructor in the C++ code
- Gparmd
double parameters for the G-spline constructor in the C++ code
- specification
specification of the G-spline model (1 or 2), see
bayesHistogram
for more detail
and the following attributes:
init |
prior |
mcmc.par |
Value for give.init.y
A~vector or matrix with the same structure as init.y
, i.e. with 2~columns and
n
rows in the case of the bivariate data.
Value for give.init.y2
A~list with the following components:
- init.y
a~vector of length
n
or a~matrix with 2 columns andn
rows with initial log(onset times)- init2.y
a~vector of length
n
or a~matrix with 2 columns andn
rows with initial log(times-to-event). If the data are not doubly cdensored, this object is equal to 0.- y1.left
lower limit of the log-response (or exact/right/left censored observation) as required by the C++ function
bayesBisurvreg
, related to the onset time in the case of doubly censoring and to the event time otherwise- y1.right
upper limit of the log-response as required by the C++ function
bayesBisurvreg
, related to the onset time in the case of doubly censoring and to the event time otherwise- status1
status vector as required by the C++ function
bayesBisurvreg
related to the onset time in the case of doubly censoring and to the event time otherwise- t2.left
lower limit of the response as required by the C++ function
bayesBisurvreg
, related to time-to-event in the case of doubly censoring, equal to 0 if there is no doubly-censoring- t2.right
upper limit of the response as required by the C++ function
bayesBisurvreg
, related to time-to-event in the case of doubly censoring, equal to 0 if there is no doubly-censoring- status2
status vector related to time-to-event in the case of doubly censoring, equal to 0 otherwise.
Value for give.init.r
To be added somewhen...
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Brief summary for the chain(s) obtained using the MCMC.
Description
This function computes a sample mean, quantiles and a Bayesian
p
-value which is defined as
p = 2\times\min(n_{-},\,n_{+}),
where
n_{-}
is the number of the sampled values which are
negative and n_{+}
is the number of sampled values which
are positive.
Usage
give.summary(sample, probs=c(0.5, 0.025, 0.975))
Arguments
sample |
a data frame or a vector with sampled values |
probs |
probabilities of the quantiles that are to be computed |
Value
A matrix or a vector with the sample mean, quantiles and a Bayesian p
-value.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Examples
## Example with a sample stored in a vector:
sample <- rnorm(1000)
give.summary(sample)
## Example with a sample stored in a data.frame:
sample <- data.frame(x=rnorm(1000), y=rgamma(1000, shape=1, rate=1))
give.summary(sample)
Summary for the marginal density estimates based on the bivariate model with Bayesian G-splines.
Description
Compute the estimate of the marginal density function based on the values sampled using the MCMC (MCMC average evaluated in a grid of values) in a model where density is specified as a bivariate Bayesian G-spline.
This function serves to summarize the MCMC chains related to the distributional parts
of the considered models obtained using the functions:
bayesHistogram
and bayesBisurvreg
.
If asked, this function returns also the values of the marginal G-spline evaluated in a grid at each iteration of MCMC.
Usage
marginal.bayesGspline(dir, extens = "", K, grid1, grid2,
skip = 0, by = 1, last.iter, nwrite, only.aver = TRUE)
Arguments
dir |
directory where to search for files (‘mixmoment.sim’, ‘mweight.sim’, ‘mmean.sim’, ‘gspline.sim’) with the MCMC sample. |
extens |
an extension used to distinguish different sampled
G-splines if more G-splines were used in one simulation (e.g. with
doubly-censored data). According to which
|
K |
a~vector of length 2 specifying the number of knots at each side of the middle knot for each dimension of the G-spline. |
grid1 |
grid of values from the first dimension at which the sampled marginal densities are to be evaluated. |
grid2 |
grid of values from the second dimension at which the sampled marginal densities are to be evaluated. |
skip |
number of rows that should be skipped at the beginning of each *.sim file with the stored sample. |
by |
additional thinning of the sample. |
last.iter |
index of the last row from *.sim files that should be
used. If not specified than it is set to the maximum available
determined according to the file |
nwrite |
frequency with which is the user informed about the
progress of computation (every |
only.aver |
|
Value
An object of class marginal.bayesGspline
is returned. This object is a
list with components margin1
and margin2
(for two margins).
Both margin1
and margin2
components are data.frames with
columns named grid
and average
where
grid |
is a grid of values (vector) at which the McMC average of the marginal G-spline was computed. |
average |
are McMC averages of the marginal G-spline (vector) evaluated in
|
There exists a method to plot objects of the class marginal.bayesGspline
.
Attributes
Additionally, the object of class marginal.bayesGspline
has the following
attributes:
sample.size
a length of the McMC sample used to compute the McMC average.
sample1
marginal (margin = 1) G-spline evaluated in a grid of values. This attribute is present only if
only.aver = FALSE
.This a matrix with
sample.size
columns and length(grid1) rows.sample2
marginal (margin = 2) G-spline evaluated in a grid of values. This attribute is present only if
only.aver = FALSE
.This a matrix with
sample.size
columns and length(grid2) rows.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2006). Bayesian semi-parametric accelerated failurew time model for paired doubly interval-censored data. Statistical Modelling, 6, 3 - 22.
Examples
## See the description of R commands for
## the models described in
## Komarek (2006),
## Komarek and Lesaffre (2006),
##
## R commands available
## in the documentation
## directory of this package
## - see ex-tandmobPA.R and
## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobPA.pdf
##
Plot an object of class bayesDensity
Description
This function plots an object created by bayesDensity
.
Usage
## S3 method for class 'bayesDensity'
plot(x, k.cond, dim.plot = TRUE, over = TRUE,
alegend = TRUE, standard = TRUE, center = FALSE,
type = "l", bty = "n",
xlab = expression(epsilon), ylab = expression(f(epsilon)),
lty, xlim, ylim, xleg, yleg, main, ...)
Arguments
x |
an object of class |
k.cond |
a numerical vector giving the numbers of mixture components for which the conditional densities
are to be plotted. 0 states for the unconditional (overall) density, averaged over the mixture with all possible
numbers of components. If NULL, all conditional and the
unconditional density found in |
dim.plot |
an indicator whether the dimension of the plot used
in |
over |
an indicator whether all densities should be drawn into
one plot using different types of lines. If |
alegend |
an indicator whether an automatic legend should be added to the plot. |
standard |
logical, do we want to plot standardized density? |
center |
logical, do we want to plot centered density?, set both
|
xleg , yleg |
position of the legend if |
type , bty , xlab , ylab , lty , xlim , ylim , main , ... |
other
arguments passed to the |
Value
No return value.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Plot an object of class bayesGspline
Description
This function plots an object created by bayesGspline
.
Usage
## S3 method for class 'bayesGspline'
plot(x, add = FALSE, type = "l", lty=1, bty = "n",
xlab, ylab, main, sub, ...)
Arguments
x |
an object of class |
add |
if |
type , lty , bty , xlab , ylab , main , sub , ... |
other
arguments passed to the |
Value
No return value.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Plot an object of class marginal.bayesGspline
Description
This function plots an object created by marginal.bayesGspline
.
Usage
## S3 method for class 'marginal.bayesGspline'
plot(x, type = "l", lty=1, bty = "n",
xlab1, ylab1, main1, xlab2, ylab2, main2, sub, ...)
Arguments
x |
an object of class |
type , lty , bty , xlab1 , ylab1 , main1 , xlab2 , ylab2 , main2 , sub , ... |
other
arguments passed to the |
Value
No return value.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Compute predictive quantities based on a Bayesian survival regression model fitted using bayessurvreg1 function.
Description
This function runs additional McMC to compute predictive survivor and hazard curves and predictive event times for specified values of covariates.
Firstly, the function bayessurvreg1
has to be used to
obtain a sample from the posterior distribution of unknown quantities.
Directly, posterior predictive quantiles and means of asked quantities are computed and stored in files.
Function predictive.control
serves only to perform some input
checks inside the main function predictive
.
Usage
predictive(formula, random, time0 = 0, data = parent.frame(),
grid, type = "mixture", subset, na.action = na.fail,
quantile = c(0, 0.025, 0.5, 0.975, 1),
skip = 0, by = 1, last.iter, nwrite, only.aver = FALSE,
predict = list(Et=TRUE, t=FALSE, Surv=TRUE, hazard=FALSE, cum.hazard=FALSE),
store = list(Et=TRUE, t = FALSE, Surv = FALSE, hazard = FALSE, cum.hazard=FALSE),
Eb0.depend.mix = FALSE,
dir, toler.chol = 1e-10, toler.qr = 1e-10)
predictive.control(predict, store, only.aver, quantile)
Arguments
formula |
the same formula as that one used to sample from the
posterior distribution of unknown quantities by the function
|
random |
the same |
time0 |
starting time for the survival model. This option is used
to get correct hazard function in the case that the original model was
|
data |
optional data frame in which to interpret the variables
occuring in the formulas. Usually, you create a new
|
grid |
a list of length as number of observations in |
type |
a character string giving the type of assumed error distribution. Currently, valid are substrings of "mixture". In the future, "spline", "polya.tree" might be also implemented. |
subset |
subset of the observations from the |
na.action |
function to be used to handle any |
quantile |
a vector of quantiles that are to be computed for each predictive quantity. |
skip |
number of rows that should be skipped at the beginning of each *.sim file with the stored sample. |
by |
additional thinning of the sample. |
last.iter |
index of the last row from *.sim files that should be
used. If not specified than it is set to the maximum available
determined according to the file |
nwrite |
frequency with which is the user informed about the
progress of computation (every |
only.aver |
if |
predict |
a list of logical values indicating which predictive quantities are to be sampled. Components of the list:
|
store |
a list of logical values indicating which predictive quantities are to be stored in files as ‘predET*.sim’, ‘predT*.sim’, ‘predS*.sim’, ‘predhazard*.sim’, ‘predcumhazard*.sim’. If you are interested only in posterior means or quantiles of the predictive quantities you do not have to store sampled values. Posterior means and quantiles are stored in files ‘quantET*.sim’, ‘quantT*.sim’, ‘quantS*.sim’, ‘quanthazard*.sim’, ‘quantpredhazard*.sim’. |
Eb0.depend.mix |
a logical value indicating whether the mean of
the random intercept (if included in the model) was given in a
hierarchical model as an overall mean of the mixture in the error
term. With |
dir |
a string giving a directory where previously simulated values were stored and where newly obtained quantities will be stored. On Unix, do not use ‘~/’ to specify your home directory. A full path must be given, e.g. ‘/home/arnost/’. |
toler.chol |
tolerance for the Cholesky decomposition. |
toler.qr |
tolerance for the QR decomposition. |
Value
An integer which should be equal to zero if everything ran fine.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2007). Bayesian accelerated failure time model for correlated interval-censored data with a normal mixture as an error distribution. Statistica Sinica, 17, 549 - 569.
Examples
## See the description of R commands for
## the models described in
## Komarek (2006),
## Komarek and Lesaffre (2007).
##
## R commands available
## in the documentation
## directory of this package as
## - ex-cgd.R and
## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-cgd.pdf
##
## - ex-tandmobMixture.R and
## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobMixture.pdf
##
Compute predictive quantities based on a Bayesian survival regression model fitted using bayesBisurvreg or bayessurvreg2 or bayessurvreg3 functions.
Description
This function computes predictive densities, survivor and hazard curves for specified combinations of covariates.
Firstly, either the function bayesBisurvreg
or the
function bayessurvreg2
or the function bayessurvreg3
has to be used to obtain a sample from the posterior distribution of unknown quantities.
Function predictive2.control
serves only to perform some input
checks inside the main function predictive2
.
Usage
predictive2(formula, random, obs.dim, time0, data = parent.frame(),
grid, na.action = na.fail, Gspline,
quantile = c(0, 0.025, 0.5, 0.975, 1),
skip = 0, by = 1, last.iter, nwrite,
only.aver = TRUE,
predict = list(density=FALSE, Surv=TRUE,
hazard=FALSE, cum.hazard=FALSE),
dir, extens = "", extens.random="_b", version=0)
predictive2Para(formula, random, obs.dim, time0, data = parent.frame(),
grid, na.action = na.fail, Gspline,
quantile = c(0, 0.025, 0.5, 0.975, 1),
skip = 0, by = 1, last.iter, nwrite,
only.aver = TRUE,
predict = list(density=FALSE, Surv=TRUE,
hazard=FALSE, cum.hazard=FALSE),
dir, extens = "", extens.random="_b", version=0)
predictive2.control(predict, only.aver, quantile, obs.dim,
time0, Gspline, n)
Arguments
formula |
the same formula as that one used to sample from the
posterior distribution of unknown quantities by the function
REMARK: the prediction must be asked for at least two combinations of covariates. This is the restriction imposed by one of the internal functions I use. | ||
random |
the same | ||
obs.dim |
a vector that has to be supplied if bivariate data were
used for estimation (using the function
| ||
time0 |
a~vector of length | ||
data |
optional data frame in which to interpret the variables
occuring in the formulas. Usually, you create a new
| ||
grid |
a~vector giving the grid of values where predictive
quantities are to be evaluated. The grid should normally start at some
value slightly higher than | ||
na.action |
function to be used to handle any | ||
Gspline |
a~list specifying the G-spline used for the error distribution in the model. It is a~list with the following components:
| ||
quantile |
a vector of quantiles that are to be computed for each predictive quantity. | ||
skip |
number of rows that should be skipped at the beginning of each *.sim file with the stored sample. | ||
by |
additional thinning of the sample. | ||
last.iter |
index of the last row from *.sim files that should be
used. If not specified than it is set to the maximum available
determined according to the file | ||
nwrite |
frequency with which is the user informed about the
progress of computation (every | ||
only.aver |
if The word of warning: with | ||
predict |
a list of logical values indicating which predictive quantities are to be computed. Components of the list:
| ||
dir |
directory where to search for files (‘mixmoment.sim’, ‘mweight.sim’, mmean.sim', gspline.sim', 'beta.sim', 'D.sim', ...) with the McMC sample. | ||
extens |
an extension used to distinguish different sampled G-splines if more formulas were used in one MCMC simulation (e.g. with doubly-censored data).
| ||
extens.random |
only applicable if the function
This is an extension used to distinguish different sampled G-splines determining the distribution of the random intercept (under the presence of doubly-censored data).
| ||
version |
this argument indicates by which
| ||
n |
number of covariate combinations for which the prediction will be performed. |
Value
A list with possibly the following components (what is included depends
on the value of the arguments predict
and only.aver
):
grid |
a~vector with the grid values at which the survivor function, survivor density, hazard and cumulative hazard are computed. |
Surv |
predictive survivor functions. A~matrix with as many columns as length(grid) and as many rows as the number of covariate combinations for which the predictive quantities were asked. One row per covariate combination. |
density |
predictive survivor densities. The same structure as |
hazard |
predictive hazard functions. The same structure as |
cum.hazard |
predictive cumulative hazard functions. The same structure as |
quant.Surv |
pointwise quantiles for the predictive survivor functions. This is a list with as many components as the number of covariate combinations. One component per covariate combination. Each component of this list is a~matrix with as many columns as
length(grid) and as many rows as the length of the argument
|
quant.density |
pointwise quantiles for the predictive survivor densities. The same structure as |
quant.hazard |
pointwise quantiles for the predictive hazard functions. The same structure as |
quant.cum.hazard |
pointwise quantiles for the predictive cumulative hazard functions. The same structure as |
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2008). Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association, 103, 523 - 533.
Komárek, A. and Lesaffre, E. (2006). Bayesian semi-parametric accelerated failurew time model for paired doubly interval-censored data. Statistical Modelling, 6, 3 - 22.
Komárek, A., Lesaffre, E., and Legrand, C. (2007). Baseline and treatment effect heterogeneity for survival times between centers using a random effects accelerated failure time model with flexible error distribution. Statistics in Medicine, 26, 5457 - 5472.
Examples
## See the description of R commands for
## the models described in
## Komarek (2006),
## Komarek and Lesaffre (2006),
## Komarek and Lesaffre (2008),
## Komarek, Lesaffre, and Legrand (2007).
##
## R commands available in the documentation
## directory of this package
## - ex-tandmobPA.R and
## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobPA.pdf
## - ex-tandmobCS.R and
## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobCS.pdf
## - ex-eortc.R and
## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-eortc.pdf
Print a summary for the density estimate based on the Bayesian model.
Description
This function prints a~object created by bayesDensity
.
Usage
## S3 method for class 'bayesDensity'
print(x, ...)
Arguments
x |
an object of class |
... |
this is here for a consistency with a generic function. |
Value
No return value.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Sample from the multivariate normal distribution
Description
According to the parametrization used, sample from the multivariate normal distribution.
The following parametrization can be specified
- standard
-
In this case we sample from either
\mathcal{N}(\mu, \Sigma)
or from\mathcal{N}(\mu, Q^{-1}).
- canonical
-
In this case we sample from
\mathcal{N}(Q^{-1}b,\;Q^{-1}).
Generation of random numbers is performed by Algorithms 2.3-2.5 in Rue and Held (2005, pp. 34-35).
Usage
rMVNorm(n, mean=0, Sigma=1, Q, param=c("standard", "canonical"))
Arguments
n |
number of observations to be sampled. |
mean |
For For |
Sigma |
covariance matrix of the multivariate normal
distribution. It is ignored if |
Q |
precision matrix of the multivariate normal distribution. It does not have to be supplied provided It must be supplied if |
param |
a character which specifies the parametrization. |
Value
Matrix with sampled points in rows.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications. Boca Raton: Chapman and Hall/CRC.
See Also
Examples
### Mean, covariance matrix, its inverse
### and the canonical mean
mu <- c(0, 2, 0.5)
L <- matrix(c(1, 1, 1, 0, 0.5, 0.5, 0, 0, 0.3), ncol=3)
Sigma <- L %*% t(L)
Q <- chol2inv(t(L))
b <- Q %*% mu
print(Sigma)
print(Q)
print(Sigma %*% Q)
### Sample using different parametrizations
set.seed(775988621)
n <- 10000
### Sample from N(mu, Sigma)
xx1 <- rMVNorm(n=n, mean=mu, Sigma=Sigma)
apply(xx1, 2, mean)
var(xx1)
### Sample from N(mu, Q^{-1})
xx2 <- rMVNorm(n=n, mean=mu, Q=Q)
apply(xx2, 2, mean)
var(xx2)
### Sample from N(Q^{-1}*b, Q^{-1})
xx3 <- rMVNorm(n=n, mean=b, Q=Q, param="canonical")
apply(xx3, 2, mean)
var(xx3)
Sample from the Wishart distribution
Description
Sample from the Wishart distribution
\mbox{Wishart}(\nu, S),
where \nu
are degrees of freedom of the Wishart distribution
and S
is its scale matrix. The same parametrization as in
Gelman (2004) is assumed, that is, if
W\sim\mbox{Wishart}(\nu,S)
then
\mbox{E}(W) = \nu S
.
In the univariate case, \mbox{Wishart}(\nu,S)
is the
same as \mbox{Gamma}(\nu/2, 1/(2S)).
Generation of random numbers is performed by the algorithm described in Ripley (1987, pp. 99).
Usage
rWishart(n, df, S)
Arguments
n |
number of observations to be sampled. |
df |
degrees of freedom of the Wishart distribution. |
S |
scale matrix of the Wishart distribution. |
Value
Matrix with sampled points (lower triangles of W
) in rows.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2004). Bayesian Data Analysis, Second edition. Boca Raton: Chapman and Hall/CRC.
Ripley, B. D. (1987). Stochastic Simulation. New York: John Wiley and Sons.
Examples
### The same as rgamma(n, shape=df/2, rate=1/(2*S))
n <- 1000
df <- 1
S <- 3
w <- rWishart(n=n, df=df, S=S)
mean(w) ## should be close to df*S
var(w) ## should be close to 2*df*S^2
### Multivariate Wishart
n <- 1000
df <- 2
S <- matrix(c(1,3,3,13), nrow=2)
w <- rWishart(n=n, df=df, S=S)
apply(w, 2, mean) ## should be close to df*S
df*S
df <- 2.5
S <- matrix(c(1,2,3,2,20,26,3,26,70), nrow=3)
w <- rWishart(n=n, df=df, S=S)
apply(w, 2, mean) ## should be close to df*S
df*S
Compute a sample covariance matrix.
Description
This function computes a sample covariance matrix.
Usage
sampleCovMat(sample)
Arguments
sample |
a |
Details
When y_1, \dots, y_n
is a sequence of
p
-dimensional vectors y_i
the sample covariance
matrix S
is equal to
S = \frac{1}{n-1} \sum_{i=1}^n (y_i - m)(y_i - m)^T
where
m = \frac{1}{n}\sum_{i=1}^n y_i.
When n=1
the function returns just sum of squares.
Value
This function returns a matrix.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Examples
## Sample some values
z1 <- rnorm(100, 0, 1) ## first components of y
z2 <- rnorm(100, 5, 2) ## second components of y
z3 <- rnorm(100, 10, 0.5) ## third components of y
## Put them into a data.frame
sample <- data.frame(z1, z2, z3)
## Compute a sample covariance matrix
sampleCovMat(sample)
Estimate of the Kendall's tau from the bivariate model
Description
This function computes an estimate of the residual (after adjustment
for covariates) Kendall's tau for the bivariate survival model fitted
using the functions bayesHistogram
or
bayesBisurvreg
.
For both these function their argument prior$specification
must
be equal to 2!
When G
is a bivariate distribution function, the population
version of the Kendall's tau is defined as
\tau = 4\int G dG - 1
.
For the model estimated using one of the above mentioned functions the value of Kendall's tau at each iteration of MCMC is equal to
\tau =
4\sum_{i=-K_1}^{K_1}\sum_{j=-K_2}^{K_2}\sum_{k=-K_1}^{K_1}\sum_{l=-K_2}^{K_2}w_{i,j} w_{k,l}
\Phi\left(\frac{\mu_{1,i} - \mu_{1,k}}{\sqrt{2}\sigma_1}\right)
\Phi\left(\frac{\mu_{2,j} - \mu_{2,l}}{\sqrt{2}\sigma_2}\right)
- 1,
where \mu_{1,-K_1},\dots,\mu_{1,K_1}
are knots in the first margin,
\mu_{2,-K_2},\dots,\mu_{2,K_2}
are knots in the second margin,
\sigma_1
is the basis standard deviation in the first margin,
\sigma_2
is the basis standard deviation in the second margin,
and w_{i,j},\;i=-K_1,\dots,K_1,\;j=-K_2,\dots,K_2
are the G-spline weights.
Usage
sampled.kendall.tau(dir = getwd(), extens = "", K,
skip = 0, by = 1, last.iter, nwrite)
Arguments
dir |
directory where to search for files (‘mixmoment.sim’, ‘mweight.sim’, ‘mmean.sim’, ‘gspline.sim’) with the MCMC sample. |
extens |
an extension used to distinguish different sampled
G-splines if more G-splines were used in one simulation (with
doubly-censored data) According to which
|
K |
a~vector of length 2 specifying the number of knots at each side of the middle knot for each dimension of the G-spline. |
skip |
number of rows that should be skipped at the beginning of each *.sim file with the stored sample. |
by |
additional thinning of the sample. |
last.iter |
index of the last row from *.sim files that should be
used. If not specified than it is set to the maximum available
determined according to the file |
nwrite |
frequency with which is the user informed about the
progress of computation (every |
Value
A vector with sampled values of the Kendall's tau.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2006). Bayesian semi-parametric accelerated failurew time model for paired doubly interval-censored data. Statistical Modelling, 6, 3 - 22.
Examples
## See the description of R commands for
## the models described in
## Komarek (2006),
## Komarek and Lesaffre (2006),
##
## R commands available
## in the documentation
## directory of this package
## - see ex-tandmobPA.R and
## https://www2.karlin.mff.cuni.cz/~komarek/software/bayesSurv/ex-tandmobPA.pdf
##
Read Data Values
Description
Read numeric data into a data frame from a file. Header is assumed to be present in the file.
Usage
scanFN(file, quiet=FALSE)
Arguments
file |
the name of a file to read data values from. If the
specified file is Otherwise, the file name is interpreted relative to the
current working directory (given by Alternatively,
To read a data file not in the current encoding (for example a
Latin-1 file in a UTF-8 locale or conversely) use a
|
quiet |
logical: if |
Details
See scan
.
Value
data.frame
with read data values.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
See Also
Examples
cat("x y z", "1 2 3", "1 4 6", "10 20 30", file="ex.data", sep="\n")
pp <- scanFN("ex.data", quiet=FALSE)
pp <- scanFN("ex.data", quiet= TRUE)
print(pp)
unlink("ex.data") # tidy up
Compute a simultaneous p-value from a sample for a vector valued parameter.
Description
The p-value is computed as 1 - the credible level of the credible region which just cover the point (0, 0, ..., 0)'.
The function returns also the simultaneous credible region (rectangle) with a specified credible level.
Usage
simult.pvalue(sample, precision=0.001, prob=0.95)
## S3 method for class 'simult.pvalue'
print(x, ...)
Arguments
sample |
a data frame or matrix with sampled values (one column = one parameter) |
precision |
precision with which the p-value is to be computed |
prob |
probability for which the credible region is to be computed |
x |
an object of class simult.pvalue |
... |
who knows |
Value
An object of class 'simult.pvalue'.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Besag, J., Green, P., Higdon, D. and Mengersen, K. (1995). Bayesian computation and stochastic systems (with Discussion). Statistical Science, 10, 3 - 66. page 30
Held, L. (2004). Simultaneous posterior probability statements from Monte Carlo output. Journal of Computational and Graphical Statistics, 13, 20 - 35.
Examples
m <- 1000
sample <- data.frame(x1=rnorm(m), x2=rnorm(m), x3=rnorm(m))
simult.pvalue(sample)
sample <- data.frame(x1=rnorm(m), x2=rnorm(m), x3=rnorm(m, mean=2))
simult.pvalue(sample)
sample <- data.frame(x1=rnorm(m), x2=rnorm(m), x3=rnorm(m, mean=5))
simult.pvalue(sample, prob=0.99, precision=0.0001)
Signal Tandmobiel data, version 2
Description
This is the dataset resulting from a longitudinal prospective dental study performed in Flanders (North of Belgium) in 1996 – 2001. The cohort of 4\,468 randomly sampled children who attended the first year of the basic school at the beginning of the study was annualy dental examined by one of 16 trained dentists. The original dataset consists thus of at most 6 dental observations for each child.
The dataset presented here contains mainly the information on the emergence and caries times summarized in the interval-censored observations. Some baseline covariates are also included here.
For more detail on the design of the study see Vanobbergen et al. (2000).
This data set was used in the analyses presented in Komárek et al. (2005), in Lesaffre, Komárek, and Declerck (2005) and in Komárek and Lesaffre (2007).
IMPORTANT NOTICE: It is possible to use these data for your
research work under the condition that each manuscript is first
approved by
Prof. Emmanuel Lesaffre
Leuven Biostatistics and statistical Bioinformatics Centre (L-BioStat)
Katholieke Universiteit Leuven
Kapucijnenvoer 35
B-3000 Leuven
Belgium
<emmanuel.lesaffre@kuleuven.be
>
Usage
data(tandmob2)
Format
a~data frame with 4\,430 rows (38 sampled children did not come to any of the designed dental examinations) and the following variables
- IDNR
identification number of a child
- GENDER
character boy or girl
- GENDERNum
numeric, 0 = boy, 1 = girl
- DOB
character, date of birth in the format DDmmmYY
- PROVINCE
factor, code of the province with
- 0 =
Antwerpen
- 1 =
Vlaams Brabant
- 2 =
Limburg
- 3 =
Oost Vlaanderen
- 4 =
West Vlaanderen
- EDUC
factor, code of the educational system with
- 0 =
Free
- 1 =
Community school
- 2 =
Province/council school
- STARTBR
factor, code indicating the starting age of brushing the teeth (as reported by parents) with
- 1 =
[0, 1] years
- 2 =
(1, 2] years
- 3 =
(2, 3] years
- 4 =
(3, 4] years
- 5 =
(4, 5] years
- 6 =
later than at the age of 5
- FLUOR
binary covariate, 0 = no, 1 = yes. This is the covariate fluorosis used in the paper Komárek et al. (2005).
- BAD.xx
binary, indicator whether a deciduous tooth xx was removed becaues of orthodontical reasons or not.
xx takes values 53, 63, 73, 83 (deciduous lateral canines), 54, 64, 74, 84 (deciduous first molars), 55, 65, 75, 85 (deciduous second molars).
- EBEG.xx
lower limit of the emergence (in years of age) of the permanent tooth xx.
NA
if the emergence was left-censored.xx takes values 11, 21, 31, 41 (permanent incisors), 12, 22, 32, 42 (permanent central canines), 13, 23, 33, 43 (permanent lateral canines), 14, 24, 34, 44 (permanent first premolars), 15, 25, 35, 45 (permanent second premolars), 16, 26, 36, 46 (permanent first molars), 17, 27, 37, 47 (permanent second molars).
- EEND.xx
upper limit of the emergence (in years of age) of the permanent tooth xx.
NA
if the emergence was right-censored.xx takes values as for the variable
EBEG.xx
.- FBEG.xx
lower limit for the caries time (in years of age, ‘F’ stands for ‘failure’) of the permanent tooth xx.
NA
if the caries time was left-censored.xx takes values as for the variable
EBEG.xx
.- FEND.xx
upper limit for the caries time (in years of age, ‘F’ stands for ‘failure’) of the permanent tooth xx.
NA
if the caries time was right-censored.xx takes values as for the variable
EBEG.xx
.Unfortunately, for all teeth except 16, 26, 36 and 46 almost all the caries times are right-censored. For teeth 16, 26, 36, 46, the amount of right-censoring is only about 25%.
- Txx.DMF
indicator whether a deciduous tooth xx was decayed or missing due to caries or filled on at most the last examination before the first examination when the emergence of the permanent successor was recorded.
xx takes values 53, 63, 73, 83 (deciduous lateral incisors), 54, 64, 74, 84 (deciduous first molars), 55, 65, 75, 85 (deciduous second molars).
- Txx.CAR
indicator whether a~deciduous tooth xx was removed due to the orthodontical reasons or decayed on at most the last examination before the first examination when the emergence of the permanent successor was recorded.
Source
Leuven Biostatistics and statistical Bioinformatics Centre (L-BioStat), Katholieke Universiteit Leuven, Kapucijnenvoer 35, 3000 Leuven, Belgium
URL:
https://gbiomed.kuleuven.be/english/research/50000687/50000696/
Data collection was supported by Unilever, Belgium. The Signal Tandmobiel project comprises the following partners: D. Declerck (Dental School, Catholic University Leuven), L. Martens (Dental School, University Ghent), J. Vanobbergen (Oral Health Promotion and Prevention, Flemish Dental Association), P. Bottenberg (Dental School, University Brussels), E. Lesaffre (Biostatistical Centre, Catholic University Leuven), K. Hoppenbrouwers (Youth Health Department, Catholic University Leuven; Flemish Association for Youth Health Care).
References
Komárek, A., Lesaffre, E.,
\mbox{H\"{a}rk\"{a}nen,}
T., Declerck, D., and
Virtanen, J. I. (2005).
A Bayesian analysis of multivariate doubly-interval-censored dental data.
Biostatistics, 6, 145–155.
Komárek, A. and Lesaffre, E. (2007). Bayesian accelerated failure time model for correlated interval-censored data with a normal mixture as an error distribution. Statistica Sinica, 17, 549–569.
Lesaffre, E., Komárek, A., and Declerck, D. (2005). An overview of methods for interval-censored data with an emphasis on applications in dentistry. Statistical Methods in Medical Research, 14, 539–552.
Vanobbergen, J., Martens, L., Lesaffre, E., and Declerck, D. (2000). The Signal-Tandmobiel project – a longitudinal intervention health promotion study in Flanders (Belgium): baseline and first year results. European Journal of Paediatric Dentistry, 2, 87–96.
Signal Tandmobiel data, version Roos
Description
This is the dataset resulting from a longitudinal prospective dental study performed in Flanders (North of Belgium) in 1996 – 2001. The cohort of 4\,468 randomly sampled children who attended the first year of the basic school at the beginning of the study was annualy dental examined by one of 16 trained dentists. The original dataset consists thus of at most 6 dental observations for each child.
The dataset presented here contains mainly the information on the emergence and caries times summarized in the interval-censored observations. Some baseline covariates are also included here.
For more detail on the design of the study see Vanobbergen et al. (2000).
This is the version of the dataset used first by Leroy et al. (2005)
and contains a subset of the tandmob2
. Some children
were removed to satisfy inclusion criteria given in Leroy et
al. (2005). Additionally, left-censored emergence times of the
permanent first molars are adjusted according to the eruption stage
(see Leroy et al., 2005).
This data set was then used in the analyses presented in Komárek and Lesaffre (2006, 2008).
IMPORTANT NOTICE: It is possible to use these data for your
research work under the condition that each manuscript is first
approved by
Prof. Emmanuel Lesaffre
Leuven Biostatistics and statistical Bioinformatics Centre (L-BioStat)
Katholieke Universiteit Leuven
Kapucijnenvoer 35
B-3000 Leuven
Belgium
<emmanuel.lesaffre@kuleuven.be
>
Usage
data(tandmobRoos)
Format
a~data frame with 4\,394 rows and the following variables
- IDNR
identification number of a child
- GENDER
character boy or girl
- DOB
character, date of birth in the format DDmmmYY
- PROVINCE
factor, code of the province with
- 0 =
Antwerpen
- 1 =
Vlaams Brabant
- 2 =
Limburg
- 3 =
Oost Vlaanderen
- 4 =
West Vlaanderen
- EDUC
factor, code of the educational system with
- 0 =
Free
- 1 =
Community school
- 2 =
Province/council school
- GIRL
numeric, 0 = boy, 1 = girl
- EBEG.xx
lower limit of the emergence (in years of age) of the permanent tooth xx. In contrast to
tandmob2
, the lower emergence limit for the permanent first molars that were originally left-censored, are adjusted according to the eruption stage (see Leroy, 2005 for more details).xx takes values 16, 26, 36, 46 (permanent first molars).
- EEND.xx
upper limit of the emergence (in years of age) of the permanent tooth xx.
NA
if the emergence was right-censored.xx takes values as for the variable
EBEG.xx
.- FBEG.xx
lower limit for the caries time (in years of age, ‘F’ stands for ‘failure’) of the permanent tooth xx.
NA
if the caries time was left-censored.xx takes values as for the variable
EBEG.xx
.- FEND.xx
upper limit for the caries time (in years of age, ‘F’ stands for ‘failure’) of the permanent tooth xx.
NA
if the caries time was right-censored.xx takes values as for the variable
EBEG.xx
.Unfortunately, for all teeth except 16, 26, 36 and 46 almost all the caries times are right-censored. For teeth 16, 26, 36, 46, the amount of right-censoring is only about 25%.
- TOOTH.xx
numeric, 0 or 1. Equal to 1 if the information concerning the permanent tooth was available, 0 if the permanent tooth xx was removed from the dataset by Kris.
xx takes values 16, 26, 36, 46.
These variables are almost useless for ordinary users.
- Txxd
numeric, 0 or 1. It is equal to 1 if the deciduous tooth xx was decayed, 0 otherwise.
xx takes values 54, 64, 74, 84 (deciduous first molars), 55, 65, 75, 85 (deciduous second molars).
- Txxm
numeric, 0 or 1. It is equal to 1 if the deciduous tooth xx was missing due to caries, 0 otherwise.
xx takes values 54, 64, 74, 84 (deciduous first molars), 55, 65, 75, 85 (deciduous second molars).
- Txxf
numeric, 0 or 1. It is equal to 1 if the deciduous tooth xx was filled, 0 otherwise.
xx takes values 54, 64, 74, 84 (deciduous first molars), 55, 65, 75, 85 (deciduous second molars).
- Txxs
numeric, 0 or 1. It is equal to 1 if the deciduous tooth xx was sound, 0 otherwise.
xx takes values 54, 64, 74, 84 (deciduous first molars), 55, 65, 75, 85 (deciduous second molars).
- SEAL.xx
numeric, 0 or 1. It is equal to 1 if the permanent first molar xx was sealed in pits and fissures (a form of protection), 0 otherwise.
xx takes values 16, 26, 36, 46 (permanent first molars).
- FREQ.BR
numeric, 0 or 1. It is equal to 1 if the child brushes daily the teeth, equal to 0 if he/she brushes less than once a day.
- PLAQUE.xx.1
numeric, 0 or 1. It is equal to 1 if there was occlusal plaque in pits and fissures of the permanent tooth xx. It is equal to 0 if there was either no plaque present or the plaque was present on the total occlusal surface.
xx takes values 16, 26, 36, 46 (permanent first molars).
- PLAQUE.xx.2
numeric, 0 or 1. It is equal to 1 if there was occlusal plaque on the total occlusal surface of the permanent tooth xx. It is equal to 0 if there was either no plaque present or the plaque was present only in pits and fissures.
xx takes values 16, 26, 36, 46 (permanent first molars).
Source
Leuven Biostatistics and statistical Bioinformatics Centre (L-BioStat), Katholieke Universiteit Leuven, Kapucijnenvoer 35, 3000 Leuven, Belgium
URL:
https://gbiomed.kuleuven.be/english/research/50000687/50000696/
Data collection was supported by Unilever, Belgium. The Signal Tandmobiel project comprises the following partners: D. Declerck (Dental School, Catholic University Leuven), L. Martens (Dental School, University Ghent), J. Vanobbergen (Oral Health Promotion and Prevention, Flemish Dental Association), P. Bottenberg (Dental School, University Brussels), E. Lesaffre (Biostatistical Centre, Catholic University Leuven), K. Hoppenbrouwers (Youth Health Department, Catholic University Leuven; Flemish Association for Youth Health Care).
References
Komárek, A. and Lesaffre, E. (2008). Bayesian accelerated failure time model with multivariate doubly-interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association, 103, 523–533.
Komárek, A. and Lesaffre, E. (2006). Bayesian semi-parametric accelerated failurew time model for paired doubly interval-censored data. Statistical Modelling, 6, 3–22.
Leroy, R., Bogaerts, K., Lesaffre, E., and Declerck, D. (2005). Effect of caries experience in primary molars on cavity formation in the adjacent permanent first molar. Caries Research, 39, 342–349.
Vanobbergen, J., Martens, L., Lesaffre, E., and Declerck, D. (2000). The Signal-Tandmobiel project – a longitudinal intervention health promotion study in Flanders (Belgium): baseline and first year results. European Journal of Paediatric Dentistry, 2, 87–96.
Trace plot of MCMC output.
Description
Displays a plot of iterations vs. sampled values for each variable in the chain, with a separate plot per variable.
This is slightly modified version of traceplot
function
of a coda
package to conform to my personal preferences.
Usage
traceplot2(x, chains, bty = "n", main, xlab, ...)
Arguments
x |
|
chains |
indeces of chains from the object that are to be plotted. |
bty , main , xlab , ... |
further arguments passed to the
|
Value
No return value.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Transform single component indeces to double component indeces
Description
Components of a bivariate G-spline can be indexed in several
ways. Suppose that the knots in the first dimension are
\mu_{1,-K_1},\dots,\mu_{1,K_1}
and the knots in the second dimension
\mu_{2,-K_2},\dots,\mu_{2,K_2}.
I.e. we have 2K_1+1
knots in the first dimension and
2K_2+1
knots in the second dimension. Each G-spline
component can have a double index (k_1,k_2)
assigned which means that it corresponds to the knot
(\mu_{1,k_1},\mu_{2,k_2})
or alternatively the same G-spline component can have a~single index
r=(k_2 + K_2)\times(2K_1+1) + k_1 + K_1 + 1
assigned where r
takes values from
1,\dots,K_1\times K_2
. Single indexing is used
for example by files r.sim
and r_2.sim
generated by
functions bayesHistogram
, bayesBisurvreg
,
bayessurvreg2
to save some space.
This function serves to translate single indeces to double indeces using the relationship
k_1 = (r - 1) \mbox{ mod } (2K_1+1) - K_1
k_2 = (r - 1) \mbox{ div } (2K_1+1) - K_2
The function can be used also in one dimensional case when a~simple relationship holds
r = k + K + 1
k = r - 1 - K
Usage
vecr2matr(vec.r, KK)
Arguments
vec.r |
a~vector of single indeces |
KK |
a~vector with numbers of knots on each side of the central
knot for each dimension of the G-spline. The length of |
Value
In bivariate case: a~matrix with two columns and as many rows as the
length of vec.r
.
In univariate case: a~vector with as ,amy components as the length of vec.r
.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
Examples
### Bivariate G-spline
### with 31 knots in each dimension
KK <- c(15, 15)
### First observation in component (-15, -15),
### second observation in component (15, 15),
### third observation in component (0, 0)
vec.r <- c(1, 961, 481)
vecr2matr(vec.r, KK)