Type: | Package |
Title: | Inference for High-Dimensional Mixture Transition Distribution Models |
Version: | 0.1.0 |
Description: | Estimates parameters in Mixture Transition Distribution (MTD) models, a class of high-order Markov chains. The set of relevant pasts (lags) is selected using either the Bayesian Information Criterion or the Forward Stepwise and Cut algorithms. Other model parameters (e.g. transition probabilities and oscillations) can be estimated via maximum likelihood estimation or the Expectation-Maximization algorithm. Additionally, 'hdMTD' includes a perfect sampling algorithm that generates samples of an MTD model from its invariant distribution. For theory, see Ost & Takahashi (2023) http://jmlr.org/papers/v24/22-0266.html. |
URL: | https://github.com/MaiaraGripp/hdMTD |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Depends: | R (≥ 4.1.0) |
Imports: | methods, dplyr, purrr |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-04-20 20:59:01 UTC; maiar |
Author: | Maiara Gripp [aut, cre], Guilherme Ost [ths], Giulio Iacobelli [ths] |
Maintainer: | Maiara Gripp <maiara@dme.ufrj.br> |
Repository: | CRAN |
Date/Publication: | 2025-04-24 07:20:02 UTC |
EM estimation of MTD parameters
Description
Estimation of MTD parameters through the Expectation Maximization (EM) algorithm.
Usage
MTDest(
X,
S,
M = 0.01,
init,
iter = FALSE,
nIter = 100,
A = NULL,
oscillations = FALSE
)
Arguments
X |
A vector or single-column data frame containing an MTD chain sample ( |
S |
A numeric vector of positive integers. Typically, |
M |
A stopping point for the EM algorithm. If |
init |
A list with initial parameters: |
iter |
Logical. If |
nIter |
An integer positive number with the maximum number of iterations. |
A |
A vector with positive integers representing the state space. If not informed,
this function will set |
oscillations |
Logical. If |
Details
Regarding the M
parameter: it functions as a stopping
criterion within the EM algorithm. When the difference between
the log-likelihood computed with the newly estimated parameters
and that computed with the previous parameters falls below M
,
the algorithm halts. Nevertheless, if the value of nIter
(which represents the maximum number of iterations) is smaller
than the number of iterations required to meet the M
criterion,
the algorithm will conclude its execution when nIter
is reached.
To ensure that the M
criterion is effectively utilized, we
recommend using a higher value for nIter
, which is set to a
default of 100.
Concerning the init
parameter, it is expected to be a list
comprising either 2 or 3 entries. These entries consist of:
an optional vector named p0
, representing an independent
distribution (the probability in the first entry of p0
must be
that of the smallest element in A
and so on), a required list
of matrices pj
, containing a stochastic matrix for each
element of S
( the first matrix must refer to the smallest
element of S
and so on), and a vector named lambdas
representing
the weights, the first entry must be the weight for p0
, and then one entry
for each element in pj
list. If your MTD model does not have an independent
distribution p0
, set init$lambda[1]=0
.
Value
A list with the estimated parameters of the MTD model.
References
Lebre, Sophie and Bourguignon, Pierre-Yves. (2008). An EM algorithm for estimation in the Mixture Transition Distribution model. Journal of Statistical Computation and Simulation, 78(1), 1-15. doi:10.1080/00949650701266666
Examples
# Simulating data.
# Model:
MTD <- MTDmodel(Lambda=c(1,10),A=c(0,1),lam0=0.01)
# Sampling a chain:
X <- hdMTD::perfectSample(MTD,N=2000)
# Initial Parameters:
init <- list('p0'=c(0.4,0.6),'lambdas'=c(0.05,0.45,0.5),
'pj'=list(matrix(c(0.2,0.8,0.45,0.55),byrow = TRUE,ncol=2),
matrix(c(0.25,0.75,0.3,0.7),byrow = TRUE,ncol=2)))
# MTDest() ------------------------------------
MTDest(X,S=c(1,10),M=1,init)
MTDest(X,S=c(1,10),init=init,iter = TRUE)
MTDest(X,S=c(1,10),init=init,iter = TRUE,nIter=5)
MTDest(X,S=c(1,10),init=init,oscillations = TRUE)
Creates a Mixture Transition Distribution (MTD) Model
Description
Generates an MTD model as an object of class MTD
given a set of parameters.
Usage
MTDmodel(
Lambda,
A,
lam0 = NULL,
lamj = NULL,
pj = NULL,
p0 = NULL,
single_matrix = FALSE,
indep_part = TRUE
)
Arguments
Lambda |
A numeric vector of positive integers representing the relevant lag set. The elements will be sorted from smallest to greatest. The smallest number represents the latest (most recent) time in the past, and the largest number represents the earliest time in the past. |
A |
A vector with nonnegative integers representing the state space. |
lam0 |
A numeric value in |
lamj |
A numeric vector of weights for the transition probability matrices in |
pj |
A list with |
p0 |
A probability vector for the independent component of the MTD model. If |
single_matrix |
Logical. If |
indep_part |
Logical. If |
Details
The resulting MTD object can be used by functions such as oscillation()
, which retrieves the
model's oscillation, and perfectSample()
, which will sample an MTD Markov chain from its invariant
distribution.
Value
A list of class MTD
containing:
P
The transition probability matrix of the MTD model.
lambdas
A vector with MTD weights (
lam0
andlamj
).pj
A list of stochastic matrices defining conditional transition probabilities.
p0
The independent probability distribution.
Lambda
The vector of relevant lags.
A
The state space.
Examples
MTDmodel(Lambda=c(1,3),A=c(4,8,12))
MTDmodel(Lambda=c(2,4,9),A=c(0,1),lam0=0.05,lamj=c(0.35,0.2,0.4),
pj=list(matrix(c(0.5,0.7,0.5,0.3),ncol=2)),p0=c(0.2,0.8),single_matrix=TRUE)
MTDmodel(Lambda=c(2,4,9),A=c(0,1),lam0=0.05,
pj=list(matrix(c(0.5,0.7,0.5,0.3),ncol=2)),single_matrix=TRUE,indep_part=FALSE)
Checks a sample
Description
Checks if a sample is a suitable argument for some functions within the package.
Usage
checkSample(X)
Arguments
X |
A vector, a single-column data frame, a list, or a matrix with a single row or a single column. Must be composed by nonnegative integers. |
Value
Returns the sample as a vector or identifies any possible sample problems.
Counts sequences of length d+1 in a sample
Description
Creates a tibble containing all unique sequences of length d+1
found in
the sample, along with their absolute frequencies.
Usage
countsTab(X, d)
Arguments
X |
A numeric vector, a single-column data frame, or a list with a sample from a Markov chain. The first element must be the most recent observation. |
d |
A positive integer specifying the number of elements in each sequence,
which will be |
Details
The function generates a tibble with d+2
columns. In the first
d+1
columns, each row displays a unique sequence of size d+1
observed in the sample. The last column, called Nxa
, contains the
number of times each of these sequences appeared in the sample.
The number of rows in the output varies between 1
and |A|^{d+1}
,
where |A|
is the number of unique states in X
, since it depends
on the number of unique sequences that appear in the sample.
Value
A tibble with all observed sequences of length d+1
and their
absolute frequencies.
Examples
countsTab(c(1,2,2,1,2,1,1,2,1,2), 3)
# Using test data.
countsTab(testChains[, 1], 2)
The total variation distance between distributions
Description
Calculates the total variation distance between distributions conditioned in a given past sequence.
Usage
dTV_sample(S, j, A = NULL, base, lenA = NULL, A_pairs = NULL, x_S)
Arguments
S |
A numeric vector of positive integers (or |
j |
A positive integer representing a lag in the |
A |
A vector of unique positive integers (state space) with at least two elements. |
base |
A data frame with sequences of elements from |
lenA |
An integer |
A_pairs |
A two-column matrix with all unique pairs of elements from |
x_S |
A vector of length |
Details
This function computes the total variation distance between distributions
found in base
, which is expected to be the output of the function freqTab()
.
Therefore, base
must follow a specific structure (e.g., column names must
match, and a column named qax_Sj, containing transition distributions, must be
present). For more details on the output structure of freqTab()
, refer to its
documentation..
If you provide the state space A
, the function calculates:
lenA <- length(A)
and A_pairs <- t(utils::combn(A, 2))
.
Alternatively, you can input lenA
and A_pairs
directly and let
A <- NULL
, which is useful in loops to improve efficiency.
Value
Returns a vector of total variation distances, where each entry corresponds to the
distance between a pair of distributions conditioned on the same fixed past x_S
,
differing only in the symbol indexed by j
, which varies across all distinct pairs
of elements in A
.The output has length equal to the number of unique pairs
in A_pairs
.
Examples
#creating base argument through freqTab function.
pbase <- freqTab(S=c(1,4),j=2,A=c(1,2,3),countsTab = countsTab(testChains[,2],d=5))
dTV_sample(S=c(1,2),j=4,A=c(1,2,3),base=pbase,x_S=c(2,3))
pbase <- freqTab(S=NULL,j=1,A=c(1,2,3),countsTab = countsTab(testChains[,2],d=5))
dTV_sample(S=NULL,j=1,A=c(1,2,3),base=pbase)
A tibble containing sample sequence frequencies and estimated probabilities
Description
This function returns a tibble containing the sample sequences, their frequencies and the estimated transition probabilities.
Usage
freqTab(S, j = NULL, A, countsTab, complete = TRUE)
Arguments
S |
A numeric vector of positive integers or |
j |
An integer or |
A |
A vector with nonnegative integers. Must have at least two different entries.
|
countsTab |
A tibble or a data frame with all sequences of length d+1 that
appear in the sample, and their absolute frequency. This tibble is typically
generated by the function |
complete |
Logical. If |
Details
The parameters S
and j
determine which columns of countsTab
are retained in the output. Specifying a lag j
is optional. All lags can
be specified via S
, while leaving j = NULL
(default). The output
remains the same as when specifying S
and j
separately. The
inclusion of j
as a parameter improves clarity within the package's
algorithms. Note that j
cannot be an element of S
.
Value
A tibble where each row represents a sequence of elements from A
.
The initial columns display each sequence symbol separated into columns
corresponding to their time indexes. The remaining columns show the sample
frequencies of the sequences and the MLE (Maximum Likelihood Estimator)
of the transition probabilities.
Examples
freqTab(S=c(1,4),j=2,A=c(1,2,3),countsTab = countsTab(testChains[,2],d=5))
#Equivalent to freqTab(S=c(1,2,4),j=NULL,A=c(1,2,3),countsTab = countsTab(testChains[,2],d=5))
Inference in MTD models
Description
This function estimates the relevant lag set in a Mixture Transition Distribution (MTD) model using one of the available methods. By default, it applies the Forward Stepwise ("FS") method, which is particularly useful in high-dimensional settings. The available methods are:
"FS" (Forward Stepwise): selects the lags by a criteria that depends on their oscillations.
"CUT": a method that selects the relevant lag set based on a predefined threshold.
"FSC" (Forward Stepwise and Cut): applies the "FS" method followed by the "CUT" method.
"BIC": selects the lag set using the Bayesian Information Criterion.
Usage
hdMTD(X, d, method = "FS", ...)
Arguments
X |
A vector or single-column data frame containing a chain sample. |
d |
A positive integer representing an upper bound for the chain order. |
method |
A character string indicating the method for estimating the relevant lag set. The available methods are: "FS" (default), "FSC", "CUT", and "BIC". See the Details section and the documentation of the corresponding method functions for more information. |
... |
Additional arguments for the selected method. If not specified, default values will be used (see Details ). |
Details
The function dynamically calls the corresponding method function (e.g., hdMTD_FSC()
for
method = "FSC"
). Additional parameters specific to each method can be provided via ...
,
and default values are used for unspecified parameters.
#' This function serves as a wrapper for the method-specific functions:
-
hdMTD_FS()
, formethod = "FS"
-
hdMTD_FSC()
, formethod = "FSC"
-
hdMTD_CUT()
, formethod = "CUT"
-
hdMTD_BIC()
, formethod = "BIC"
Any additional parameters (...
) must match those accepted by the corresponding method function.
If a parameter value is not explicitly provided, a default value is used.
The main default parameters are:
-
S = seq_len(d)
: Used in "BIC" or "CUT" methods. -
l = d
. Required in "FS" or "FSC" methods. -
alpha = 0.05
,mu = 1
. Used in "CUT" or "FSC" methods. -
xi = 0.5
. Used in "CUT", "FSC" or "BIC" methods. -
minl = 1
,maxl = length(S)
,byl = FALSE
. Used in "BIC" method. All default values are specified in the documentation of the method-specific functions.
Value
A vector containing the estimated relevant lag set.
Examples
X <- testChains[,1]
hdMTD(X = X, d = 5, method = "FS", l = 2)
hdMTD(X = X, d = 5, method = "BIC", xi = 1, minl = 3, maxl = 3)
The Bayesian Information Criterion (BIC) method for inference in MTD models
Description
A function for estimating the relevant lag set \Lambda
of a Markov chain using
Bayesian Information Criterion (BIC). This means that this method selects the set of lags
that minimizes a penalized log likelihood for a given sample, see References below for
details on the method.
Usage
hdMTD_BIC(
X,
d,
S = seq_len(d),
minl = 1,
maxl = length(S),
xi = 1/2,
A = NULL,
byl = FALSE,
BICvalue = FALSE,
single_matrix = FALSE,
indep_part = TRUE,
zeta = maxl,
warning = FALSE,
...
)
Arguments
X |
A vector or single-column data frame containing a chain sample ( |
d |
A positive integer representing an upper bound for the chain order. |
S |
A numeric vector of positive integers from which this function will select
a set of relevant lags. Typically, |
minl |
A positive integer. |
maxl |
A positive integer equal to or greater than |
xi |
The BIC penalization term constant. Defaulted to 1/2. A smaller |
A |
A vector with positive integers representing the state space. If not informed,
this function will set |
byl |
Logical. If |
BICvalue |
Logical. If |
single_matrix |
Logical. If |
indep_part |
Logical. If |
zeta |
A positive integer representing the number of distinct matrices |
warning |
Logical. If |
... |
Additional arguments (not used in this function, but maintained for compatibility with |
Details
Note that the upper bound for the order of the chain (d
) affects the estimation
of the transition probabilities. If we run the function with a certain order parameter d
,
only the sequences of length d
that appeared in the sample will be counted. Therefore,
all transition probabilities, and hence all BIC values, will be calculated with respect to
that d
. If we use another value for d
to run the function, even if the output
agrees with that of the previous run, its BIC value might change a little.
The parameter zeta
indicates the the number of distinct matrices pj in the MTD.
If zeta = 1
, all matrices p_j
are identical; if zeta = 2
there exists
two groups of distinct matrices and so on. The largest value for zeta
is maxl
since this is the largest number of matrices p_j
. When minl<maxl
,
for each minl
\leq
l
\leq
maxl
, zeta = min(zeta,l)
.
If single_matrix = TRUE
then zeta
is set to 1.
Value
Returns a vector with the estimated relevant lag set using BIC. It might return more
than one set if minl < maxl
and byl = TRUE
. Additionally, it can return the value
of the penalized likelihood for the outputted lag sets if BICvalue = TRUE
.
References
Imre Csiszár, Paul C. Shields. The consistency of the BIC Markov order estimator. The Annals of Statistics, 28(6), 1601-1619. doi:10.1214/aos/1015957472
Examples
X <- testChains[, 1]
hdMTD_BIC (X, d = 6, minl = 1, maxl = 1)
hdMTD_BIC (X,d = 3,minl = 1, maxl = 2, BICvalue = TRUE)
The CUT method for inference in MTD models
Description
A function that estimates the set of relevant lags of an MTD model using the CUT method.
Usage
hdMTD_CUT(
X,
d,
S = 1:d,
alpha = 0.05,
mu = 1,
xi = 0.5,
A = NULL,
warning = FALSE,
...
)
Arguments
X |
A vector or single-column data frame containing a chain sample ( |
d |
A positive integer representing an upper bound for the chain order. |
S |
A numeric vector of distinct positive integers from which this function will select
a set of relevant lags. Should be a subset of |
alpha |
A positive real number used in the CUT threshold (which determines if two
distributions can be considered different). The larger the |
mu |
A positive real number such that |
xi |
A positive real number, |
A |
A vector with positive integers representing the state space. If not informed,
this function will set |
warning |
Logical. If |
... |
Additional arguments (not used in this function, but maintained for compatibility with |
Details
The "Forward Stepwise and Cut" (FSC) is an algorithm for inference in Mixture Transition Distribution (MTD) models. It consists in the application of the "Forward Stepwise" (FS) step followed by the CUT algorithm. This method and its steps where developed by Ost and Takahashi and are specially useful for inference in high-order MTD Markov chains. This specific function will only apply the CUT step of the algorithm and return an estimated relevant lag set.
Value
Returns a set of relevant lags estimated using the CUT algorithm.
References
Ost, G. & Takahashi, D. Y. (2023). Sparse Markov models for high-dimensional inference. Journal of Machine Learning Research, 24(279), 1-54. http://jmlr.org/papers/v24/22-0266.html
Examples
X <- testChains[,3]
hdMTD_CUT(X,4,alpha=0.02,mu=1,xi=0.4)
hdMTD_CUT(X,d=6,S=c(1,4,6),alpha=0.0065)
The Forward Stepwise (FS) method for inference in MTD models
Description
A function that estimates the set of relevant lags of an MTD model using the FS method.
Usage
hdMTD_FS(X, d, l, A = NULL, elbowTest = FALSE, warning = FALSE, ...)
Arguments
X |
A vector or single-column data frame containing a chain sample ( |
d |
A positive integer representing an upper bound for the chain order. |
l |
A positive integer specifying the number of lags to be selected as relevant. |
A |
A vector with positive integers representing the state space. If not informed,
this function will set |
elbowTest |
Logical. If TRUE, the function applies an alternative stopping criterion to determine the length of the set of relevant lags. See Details for more information. |
warning |
Logical. If |
... |
Additional arguments (not used in this function, but maintained for compatibility with |
Details
The "Forward Stepwise" (FS) algorithm is the first step of the "Forward Stepwise and Cut"
(FSC) algorithm for inference in Mixture Transition Distribution (MTD) models.
This method was developed by Ost and Takahashi
This specific function will only apply the FS step of the algorithm and return an estimated
relevant lag set of length l
.
This method iteratively selects the most relevant lags based on a certain quantity \nu
.
In the first step, the lag in 1:d
with the greatest \nu
is deemed important.
This lag is included in the output, and using this knowledge, the function proceeds to seek
the next important lag (the one with the highest \nu
among the remaining ones).
The process stops when the output vector reaches length l
if elbowTest=FALSE
.
If elbowTest = TRUE
, the function will store these maximum \nu
values
at each iteration, and output only the lags that appear before the one with smallest
\nu
among them.
Value
A numeric vector containing the estimated relevant lag set using FS algorithm.
References
Ost, G. & Takahashi, D. Y. (2023). Sparse Markov models for high-dimensional inference. Journal of Machine Learning Research, 24(279), 1-54. http://jmlr.org/papers/v24/22-0266.html
Examples
X <- testChains[,1]
hdMTD_FS(X,d=5,l=2)
hdMTD_FS(X,d=4,l=3,elbowTest = TRUE)
Forward Stepwise and Cut method for inference in MTD models
Description
A function for inference in MTD Markov chains with FSC method. This function estimates the relevant
lag set \Lambda
of an MTD model through the FSC algorithm.
Usage
hdMTD_FSC(X, d, l, alpha = 0.05, mu = 1, xi = 0.5, A = NULL, ...)
Arguments
X |
A vector or single-column data frame containing a chain sample ( |
d |
A positive integer representing an upper bound for the chain order. |
l |
A positive integer that sets the number of elements in the output vector. |
alpha |
A positive real number used in the CUT threshold (which determines if two
distributions can be considered different). The larger the |
mu |
A positive real number such that |
xi |
A positive real number, |
A |
A vector with positive integers representing the state space. If not informed,
this function will set |
... |
Additional arguments (not used in this function, but maintained for compatibility with |
Details
The "Forward Stepwise and Cut" (FSC) is an algorithm for inference in Mixture Transition Distribution (MTD) models. It consists in the application of the "Forward Stepwise" (FS) step followed by the CUT algorithm. This method and its steps where developed by Ost and Takahashi and are specially useful for inference in high-order MTD Markov chains.
Value
Returns a vector with the estimated relevant lag set using FSC algorithm.
References
Ost, G. & Takahashi, D. Y. (2023). Sparse Markov models for high-dimensional inference. Journal of Machine Learning Research, 24(279), 1-54. http://jmlr.org/papers/v24/22-0266.html
Examples
X <- testChains[,1]
hdMTD_FSC(X,4,3,alpha=0.02)
hdMTD_FSC(X,4,2,alpha=0.001)
Oscillations of an MTD Markov chain
Description
Calculates the oscillations of an MTD model object or estimates the oscillations of a chain sample.
Usage
oscillation(x, ...)
Arguments
x |
Must be an MTD object or a chain sample. |
... |
Additional parameters that might be required. Such as:
|
Details
The oscillation for a certain lag j
of an MTD model
( \{ \delta_j:\ j \in \Lambda \}
), is the product of the weight \lambda_j
multiplied by the maximum of the total variation distance between the distributions in a
stochastic matrix p_j
.
\delta_j = \lambda_j\max_{b,c \in \mathcal{A}} d_{TV}(p_j(\cdot | b), p_j(\cdot | c)).
So, if x
is an MTD object, the parameters \Lambda
, \mathcal{A}
, \lambda_j
,
and p_j
are inputted through, respectively, the entries Lambda
, A
,
lambdas
and the list pj
of stochastic matrices. Hence, an oscillation \delta_j
may be calculated for all j \in \Lambda
.
If we wish to estimate the oscillations from a sample, then x
must be a chain,
and S
, a vector representing a set of lags, must be informed. This way the transition
probabilities can be estimated. Let \hat{p}(\cdot| x_S)
symbolize an estimated distribution
in \mathcal{A}
given a certain past x_S
( which is a sequence of elements of
\mathcal{A}
where each element occurred at a lag in S
), and
\hat{p}(\cdot|b_jx_S)
an estimated distribution given past x_S
and that the symbol
b\in\mathcal{A}
occurred at lag j
.
If N
is the sample size, d=
max(S)
and N(x_S)
is the
number of times the sequence x_S
appeared in the sample, then
\delta_j = \max_{c_j,b_j \in \mathcal{A}} \frac{1}{N-d}\sum_{x_{S} \in \mathcal{A}^{S}} N(x_S)d_{TV}(\hat{p}(. | b_jx_S), \hat{p}(. | c_jx_S) )
is the estimated oscillation for a lag j \in \{1,\dots,d\}\setminus
S
. Note that \mathcal{A}^S
is the space of
sequences of \mathcal{A}
indexed by S
.
Value
If the x
parameter is an MTD object, it will provide the oscillations for
each element in Lambda
. In case x
is a chain sample, it estimates the oscillations
for a user-inputted set S
of lags.
Examples
oscillation( MTDmodel(Lambda=c(1,4),A=c(2,3) ) )
oscillation(MTDmodel(Lambda=c(1,4),A=c(2,3),lam0=0.01,lamj=c(0.49,0.5),
pj=list(matrix(c(0.1,0.9,0.9,0.1),ncol=2)), single_matrix=TRUE))
Perfectly samples an MTD Markov chain
Description
Samples an MTD Markov Chain from the stationary distribution.
Usage
perfectSample(MTD, N = NULL)
Arguments
MTD |
An MTD object, see |
N |
The sample size. If |
Details
This perfect sample algorithm requires that the MTD model has
an independent distribution (p0) with a positive weight (i.e., MTD$lambdas["lam0"]>0
which
means \lambda_0>0
).
Value
Returns a sample from an MTD model (the first element is the most recent).
Examples
perfectSample(MTDmodel(Lambda=c(1,4), A = c(0,2)), N = 200 )
perfectSample(MTDmodel(Lambda=c(2,5), A = c(1,2,3)), N = 1000 )
Estimated transition probabilities
Description
Computes the Maximum Likelihood estimators (MLE) for an MTD Markov chain with
relevant lag set S
.
Usage
probs(X, S, matrixform = FALSE, A = NULL, warning = FALSE)
Arguments
X |
A vector or single-column data frame containing a sample of a Markov chain ( |
S |
A numeric vector of unique positive integers. Typically, |
matrixform |
Logical. If |
A |
A numeric vector of distinct integers representing the state space.
If not provided, this function will set |
warning |
Logical. If |
Details
The probabilities are estimated as:
\hat{p}(a | x_S) = \frac{N(x_S a)}{N(x_S)}
where N(x_S a)
is the number of times the sequence x_S
appeared in the sample
followed by a
, and N(x_S)
is the number of times x_S
appeared
(followed by any state). If N(x_S) = 0
, the probability is set to 1 / |A|
(assuming a uniform distribution over A
).
Value
A data frame or a matrix containing estimated transition probabilities:
If
matrixform = FALSE
, the function returns a data frame with three columns:The past sequence
x_S
(a concatenation of past states).The current state
a
.The estimated probability
\hat{p}(a | x_S)
.
If
matrixform = TRUE
, the function returns a stochastic transition matrix, where rows correspond to past sequencesx_S
and columns correspond to states inA
.
Examples
X <- testChains[, 3]
probs(X, S = c(1, 30))
probs(X, S = c(1, 15, 30))
Rain data set for the city of Canberra, Australia
Description
A data frame with the rainfall history in the city of Canberra, Australia. The data spans from 01/11/2007 to 25/06/2017.
Usage
raindata
Format
A data frame with 3525 rows and 2 columns. Each row corresponds to a day specified in column 1 ("Date"). The value in column 2 ("RainToday") is 0 if no rain was recorded in the city of Canberra that day, and 1 otherwise.
- Date
Date in YYYY-MM-DD format.
- RainToday
Binary indicator (0 = no rain, 1 = rain).
Note
The original BOM data is subject to their Terms of Use. For direct access, visit the BOM website manually: https://www.bom.gov.au/climate/data/ (may require browser access).
Source
Original data source: Australian Bureau of Meteorology (BOM). Accessed via: Kaggle (https://www.kaggle.com/datasets/jsphyg/weather-dataset-rattle-package).
Examples
data(raindata)
Data with sleeping patterns
Description
The dataset contains 136,925 rows and 7 columns, representing the sleeping patterns of 151 patients over the course of one night, with measurements taken at 30-second intervals. It is a collection of 151 whole-night polysomno-graphic (PSG) sleep recordings (85 Male, 66 Female, mean age of 53.9 ± 15.4) collected during 2018 at the Haaglanden Medisch Centrum (HMC, The Netherlands) sleep center.
Usage
sleepscoring
Format
A tbl_df object with 136,925 rows representing the sleeping patterns of 151 patients.
- Patient
Identifies the patient.
- Date
Date when the measurements where made.
- Time
Time each measurement was made.
- Recorging.onset
Time, in seconds, since the beginning of the recordings.
- Duration
The duration of each lag between recordings, in seconds.
- SleepStage
The annotated sleeping stage. 'W' refers to wakefulness, 'R' to REM sleep, and 'N1', 'N2', and 'N3' refer to non-REM stages 1, 2, and 3 respectively.
- ConscientLevel
A measurement of the level of consciousness during different sleep stages. 0 indicates Wake, 1 represents REM sleep, and 2, 3, and 4 correspond to N1, N2, and N3 stages respectively.
Source
Alvarez-Estevez, D. & Rijsman, R. M. (2022). Haaglanden Medisch Centrum sleep staging database (version 1.1). PhysioNet. doi:10.13026/t79q-fr32
Alvarez-Estevez, D. & Rijsman, R. M. (2021). Inter-database validation of a deep learning approach for automatic sleep scoring. PLOS ONE, 16(8), 1-27. doi:10.1371/journal.pone.0256111
Goldberger, A. L., Amaral, L. A. N., Glass, L., Hausdorff, J. M., Ivanov, P. C., Mark, R. G., Mietus, J. E., Moody, G. B., Peng, C. K. & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation, 101(23), e215-e220. doi:10.1161/01.CIR.101.23.e215
Examples
data(sleepscoring)
Maximum temperatures in the city of Brasília, Brazil.
Description
A data frame with the maximum temperature of the last hour, by each hour, in the city of Brasília, Brazil. The data spans from 01/01/2003 to 31/08/2024.
Usage
tempdata
Format
A data frame with 189936 rows and 3 columns. Each row corresponds to a time in a day specified in columns 2 ("TIME") and 1 ("DATE") respectively. The value in column 3 ("MAXTEMP") is the maximum temperature measured in the last hour, in Celsius (Cº), in the city of Brasília, the capital of Brazil, located in the central-western part of the country.
- DATE
The day, from 01/01/2003 to 31/08/2024
- TIME
The time, form 00:00 to 23:00 each day
- MAXTEMP
The maximum temperature measured in the last hour in Celsius
Source
Meteorological data provided by INMET (National Institute of Meteorology, Brazil). Data collected from automatic weather station in Brasília (latitude: -15.79°, longitude: -47.93°, altitude: 1159.54 m). Available at: https://bdmep.inmet.gov.br/
Examples
data(tempdata)
MTD samples for tests
Description
A tibble with chains perfectly sampled from a MTD model. Each chain was sampled from the same MTD model. Hence the differences between the samples are due to randomness within the perfect sample algorithm.
Usage
testChains
Format
A tibble with 3000 rows and 3 perfectly sampled chains.
- testChain1
perfectSample1,N=3000
- testChain2
perfectSample2,N=3000
- testChain3
perfectSample3,N=3000
Details
The MTD model from which the chains were sampled was created as follows:
set.seed(1)
pj <- list("p-1"=matrix(c(0.1,0.1,0.8,0.4,0.4,0.2,0.5,0.3,0.2), byrow = T,ncol = 3),
"p-30"=matrix(c(0.05,0.2,0.75,0.4,0.4,0.2,0.3,0.3,0.4), byrow = T,ncol = 3))
MTDseed1 <- MTDmodel(Lambda=c(1,30),A=c(1,2,3),lam0=0.05, lamj = c(0.35,0.6),pj=pj)
testChain1 <- perfectSample(MTDseed1,3000)
testChain2 <- perfectSample(MTDseed1,3000)
testChain3 <- perfectSample(MTDseed1,3000)
testChains <- dplyr::as_tibble(cbind(testChain1,testChain2,testChain3))
Source
Created in-house to serve as example.
Examples
data(testChains)