Help for package hdbinseg

Type:

Package

Title:

Change-Point Analysis of High-Dimensional Time Series via Binary Segmentation

Version:

1.0.2

Date:

2023-07-27

Description:

Binary segmentation methods for detecting and estimating multiple change-points in the mean or second-order structure of high-dimensional time series as described in Cho and Fryzlewicz (2014) <doi:10.1111/rssb.12079> and Cho (2016) <doi:10.1214/16-EJS1155>.

Depends:

R (≥ 4.2.0)

License:

GPL (≥ 3)

Suggests:

RcppArmadillo

Imports:

Rcpp (≥ 0.12.10), foreach, iterators, doParallel

LinkingTo:

Rcpp, RcppArmadillo

RoxygenNote:

7.2.3

NeedsCompilation:

yes

Packaged:

2023-07-27 12:49:53 UTC; mahrc

Author:

Haeran Cho [aut, cre], Piotr Fryzlewicz [aut]

Maintainer:

Haeran Cho <haeran.cho@bristol.ac.uk>

Repository:

CRAN

Date/Publication:

2023-08-17 13:02:33 UTC

Removing change-points in the mean

Description

(Over-)estimate and remove change-points in the mean for scale estimation and bootstrapping

Usage

clean.cp(z, type = c("dcbs", "sbs"), phi = 0.5, trim = NULL, height = NULL)

Arguments

z

input data matrix, with each row representing the component time series

type

if type = 'dcbs', a binary tree of given height is grown using DCBS algorithm without thresholding, if type = 'sbs', the binary trees is grown using the SBS algorithm with thresholds chosen small

phi, trim, height

see dcbs.alg

Value

a list containing

x

z with potential change-points in the mean removed

z

mat object of an S3 bin.tree object

Double CUSUM Binary Segmentation

Description

Perform the Double CUSUM Binary Segmentation algorithm detecting change points in the mean or second-order structure of the data.

Usage

dcbs.alg(
  x,
  cp.type = c(1, 2)[1],
  phi = 0.5,
  thr = NULL,
  trim = NULL,
  height = NULL,
  tau = NULL,
  temporal = TRUE,
  scales = NULL,
  diag = FALSE,
  B = 1000,
  q = 0.01,
  do.parallel = 4
)

Arguments

x

input data matrix, with each row representing the component time series

cp.type

cp.type = 1 specifies change points in the mean, cp.type = 2 specifies change points in the second-order structure

phi

choice of parameter for weights in Double CUSUM statistic; 0 <= phi <= 1 or phi = -1 allowed with the latter leading to the DC statistic combining phi = 0 and phi = 1/2, see Section 4.1 of Cho (2016) for further details

thr

pre-defined threshold values; when thr = NULL, bootstrap procedure is employed for the threshold selection; when thr != NULL and cp.type = 1, length(thr) should be one, if cp.type = 2, length(thr) should match length(scales)

trim

length of the intervals trimmed off around the change point candidates; trim = NULL activates the default choice (trim = round(log(dim(x)[2])))

height

maximum height of the binary tree; height = NULL activates the default choice (height = floor(log(dim(x)[2], 2)/2))

tau

a vector containing the scaling constant for each row of x; if tau = NULL, a data-driven choice is made which takes into account the presence of possibly multiple mean shifts and temporal dependence when temporal = TRUE

temporal

used when cp.type = 1; if temporal = FALSE, rows of x are scaled by mad estimates, if temporal = TRUE, their long-run variance estimates are used

scales

used when cp.type = 2; negative integers representing Haar wavelet scales to be used for computing nrow(x)*(nrow(x) + 1)/2 dimensional wavelet transformation of x; a small negative integer represents a fine scale

diag

used when cp.type = 2; if diag = TRUE, only changes in the diagonal elements of the autocovariance matrices are searched for

B

used when is.null(thr); number of bootstrap samples for threshold selection

q

used when is.null(thr); indicates the quantile of bootstrap test statistics to be used for threshold selection

do.parallel

used when is.null(thr); number of copies of R running in parallel, if do.parallel = 0, %do% operator is used, see also foreach

Value

S3 bin.tree object, which contains the following fields:

tree

a list object containing information about the nodes at which change points are detected

mat

matrix concatenation of the nodes of tree

ecp

estimated change points

thr

threshold used to construct the tree

References

H. Cho (2016) change point detection in panel data via double CUSUM statistic. Electronic Journal of Statistics, vol. 10, pp. 2000–2038.

Examples

x <- matrix(rnorm(10*100), nrow = 10)
dcbs.alg(x, cp.type = 1, phi=.5, temporal = FALSE, do.parallel = 0)$ecp

x <- matrix(rnorm(100*300), nrow = 100)
x[1:10, 151:300] <- x[1:10, 151:300] + 1
dcbs.alg(x, cp.type = 1, phi=-1, temporal = FALSE, do.parallel = 0)$ecp

Growing a branch for DCBS algorithm

Description

Grow a branch of the binary tree for the Double CUSUM Binary Segmentation without thresholding

Usage

dcbs.branch(input, phi, s, e, trim)

Arguments

input

input data matrix, with each row representing the component time series or their transformation

phi, trim

see dcbs.alg

s, e

start and end of the interval of consideration at a given iteration

Value

a vector containing the information about the branch, including the location of the estimated change point and the corresponding test statistic

Growing a binary tree for DCBS algorithm

Description

Grow a binary tree of a given height via Double CUSUM Binary Segmentation without thresholding

Usage

dcbs.make.tree(input, phi = 0.5, trim = NULL, height = NULL)

Arguments

input

input data matrix, with each row representing the component time series or their transformation

phi

trim, height see dcbs.alg

Value

S3 bin.tree object, which contains the following fields:

tree

a list object containing information about the nodes at which change points are detected

mat

matrix concatenation of the nodes of tree

ecp

estimated change points

thr

threshold used to construct the tree

Bootstrapping for threshold selection in DCBS algorithm

Description

Generate thresholds for DCBS algorithm via bootstrapping

Usage

dcbs.thr(
  z,
  interval = c(1, dim(z)[2]),
  phi = 0.5,
  cp.type = 1,
  do.clean.cp = FALSE,
  temporal = TRUE,
  scales = NULL,
  diag = FALSE,
  sgn = NULL,
  B = 1000,
  q = 0.01,
  do.parallel = 4
)

Arguments

z

input data matrix, with each row representing the component time series

interval

a vector of two containing the start and the end points of the interval from which the bootstrap test statistics are to be calculated

phi, cp.type, temporal, scales, diag, B, q, do.parallel

see dcbs.alg

do.clean.cp

if do.clean.cp = TRUE pre-change point cleaning is performed

sgn

if diag = FALSE, wavelet transformations of the cross-covariances are computed with the matching signs

Value

a numeric value for the threshold

Generating Haar wavelet transformation of time series

Description

Generate Haar wavelet transformation of time series containing information about possible change-points in the second-order structure of time series

Usage

gen.input(x, scales, sq, diag, sgn = NULL)

Arguments

x

input data matrix, with each row representing the component time series

scales

negative integers for wavelet scales, with a small negative integer representing a fine scale

sq

if sq = TRUE, squared root of wavelet periodograms are used for change-point analysis

diag

if diag = TRUE, only changes in the diagonal elements of the autocovariance matrices are searched for

sgn

if diag = FALSE, wavelet transformations of the cross-covariances are computed with the matching signs

Value

matrix of the square root of scaled Haar wavelet periodograms of x

Factor model estimation via Principal Component Analysis

Description

Estimates the components of the factor structure for an input time series, such as loadings and factors, as well as estimating the number of factors.

Usage

get.factor.model(
  z,
  r.max = NULL,
  ic = c("ah", "bn")[1],
  ic.op = 2,
  normalisation = TRUE
)

Arguments

z

input data matrix, with each row representing the component time series

r.max

maximum allowed number of factors

ic

estimator for the factor number; if(ic=='ah') eigenvalue ratio estimator of Ahn and Horenstein (2013) is used, if(ic=='bn') information criterion estimator of Bai and Ng (2002) is used

ic.op

type of the estimator specified by ic

normalisation

if(normalisation==TRUE) rows of z are normalised prior to factor analysis

Value

a list containing

eigvec

eigenvectors of z

eigval

eigenvalues of z

norm.x

row-wise normalised z if(normalisation==TRUE)

r.hat

estimated number of factors

r.max

maximum number of factors used

ic.eval

vector containing information criterion evaluated at r = 0, 1, ..., r.max

mean

row-wise means of z

sd

row-wise standard deviations of z

ic

input ic

ic.op

input ic.op

References

S. C. Ahn and A. R. Horenstein (2013) Eigenvalue ratio test for the number of factors. Econometrica, vol. 81, pp. 1203–1227. J. Bai and S. Ng (2002) Determining the number of factors in approximate factor models. Econometrica, vol. 70, pp. 191–221.

Data-driven selection of the average block size

Description

Computes quantities required for data-driven selection of the average block size for stationary bootstrap

Usage

get.gg(z, M = NULL, C = 2, max.K = 5)

Arguments

z

univariate time series

M

bandwidth, by default, M = NULL to be automatically selected as in Politis and White (2004)

C

C = 2 is the default value chosen for automatic bandwidth selection in Politis and White (2004)

max.K

max.K = 5 is the default value chosen for automatic bandwidth selection in Politis and White (2004)

Value

Estimates for the two quantities required for average block size selection.

References

D. N. Politis and H. White (2004) Automatic block-length selection for the dependent bootstrap. Econometric Reviews, vol. 23, pp. 53–70.

Sparsified Binary Segmentation

Description

Perform the Sparsified Binary Segmentation algorithm detecting change-points in the mean or second-order structure of the data.

Usage

sbs.alg(
  x,
  cp.type = c(1, 2)[1],
  thr = NULL,
  trim = NULL,
  height = NULL,
  tau = NULL,
  temporal = TRUE,
  scales = NULL,
  diag = FALSE,
  B = 1000,
  q = 0.01,
  do.parallel = 4
)

Arguments

x

input data matrix, with each row representing the component time series

cp.type

cp.type = 1 specifies change-points in the mean, cp.type = 2 specifies change-points in the second-order structure

thr

pre-defined threshold values; when thr = NULL, bootstrap procedure is employed for the threshold selection; when thr != NULL and cp.type = 1, length(thr) should match nrow(x), if cp.type = 2, length(thr) should match nrow(x)*(nrow(x)+1)/2*length(scales)

trim

length of the intervals trimmed off around the change-point candidates; trim = NULL activates the default choice (trim = round(log(dim(x)[2])))

height

maximum height of the binary tree; height = NULL activates the default choice (height = floor(log(dim(x)[2], 2)/2))

tau

temporal

used when cp.type = 1; if temporal = FALSE, rows of x are scaled by mad estimates, if temporal = TRUE, their long-run variance estimates are used

scales

used when cp.type = 2; negative integers representing Haar wavelet scales to be used for computing nrow(x)*(nrow(x)+1)/2 dimensional wavelet transformation of x; a small negative integer represents a fine scale

diag

used when cp.type = 2; if diag = TRUE, only changes in the diagonal elements of the autocovariance matrices are searched for

B

used when is.null(thr); number of bootstrap samples for threshold selection

q

used when is.null(thr); quantile of bootstrap test statistics to be used for threshold selection

do.parallel

used when is.null(thr); number of copies of R running in parallel, if do.parallel = 0, %do% operator is used, see also foreach

Value

S3 bin.tree object, which contains the following fields:

tree

a list object containing information about the nodes at which change-points are detected

mat

matrix concatenation of the nodes of tree

ecp

estimated change-points

thr

threshold used to construct the tree

References

H. Cho and P. Fryzlewicz (2014) Multiple-change-point detection for high dimensional time series via sparsified binary segmentation. JRSSB, vol. 77, pp. 475–507.

Examples

x <- matrix(rnorm(20*300), nrow = 20)
sbs.alg(x, cp.type = 2, scales = -1, diag = TRUE, do.parallel = 0)$ecp

x <- matrix(rnorm(100*300), nrow = 100)
x[1:10, 151:300] <- x[1:10, 151:300]*sqrt(2)
sbs.alg(x, cp.type = 2, scales = -1, diag = TRUE, do.parallel = 0)$ecp

Growing a binary tree for SBS algorithm

Description

Grow a binary tree via Sparsified Binary Segmentation

Usage

sbs.make.tree(input, tau = rep(1, nrow(input)), thr, trim, height)

Arguments

input

input data matrix, with each row representing the component time series or their transformation

tau

scaling terms the rows of input

thr, trim, height

see sbs.alg

Value

S3 bin.tree object, which contains the following fields:

tree

a list object containing information about the nodes at which change-points are detected

mat

matrix concatenation of the nodes of tree

ecp

estimated change-points

thr

threshold used to construct the tree

Bootstrapping for threshold selection in SBS algorithm

Description

Generate thresholds for SBS algorithm via bootstrapping

Usage

sbs.thr(
  z,
  interval = c(1, dim(z)[2]),
  cp.type = 1,
  do.clean.cp = TRUE,
  scales = NULL,
  diag = FALSE,
  sgn = NULL,
  B = 1000,
  q = 0.01,
  do.parallel = 4
)

Arguments

z

input data matrix, with each row representing the component time series

interval

a vector of two containing the start and the end points of the interval from which the bootstrap test statistics are to be calculated

cp.type, scales, diag, B, q, do.parallel

see sbs.alg

do.clean.cp

if do.clean.cp = TRUE pre-change-point cleaning is performed

sgn

if diag = FALSE, wavelet transformations of the cross-covariances are computed with the matching signs

Value

if cp.type = 1, a vector of length nrow(z), each containing the threshold applied to the CUSUM statistics from the corresponding coordinate of z if cp.type = 2, a vector of length length(scales)*nrow(z) (when diag = TRUE) or length(scales)*nrow(z)*(nrow(z)+1)/2 (when diag = FALSE), each containing the threshold applied to the CUSUM statistics of the corresponding wavelet transformation of z

Searching for a change-point in a branch

Description

Searches for a change-point in a branch of a binary tree grown by the Sparsified Binary Segmentation

Usage

search.b(stat, trim)

Arguments

stat

aggregated CUSUM statistics

trim

see sbs.alg

Value

a list containing

b

a possible location of the change-point

test.stat

test statistic

FLAG

should the branch be grown?

Flat-top kernel

Description

Flat-top kernel

Usage

tri.kern(h, len = 0)

Arguments

h

bandwidth

len

desired length of the vector containing the flat-top kernel

Value

a vector containing half of the flat-top kernel