Title: | Select an Optimal Block-Length to Bootstrap Dependent Data (Block Bootstrap) |
---|---|
Description: | A set of functions to select the optimal block-length for a dependent bootstrap (block-bootstrap). Includes the Hall, Horowitz, and Jing (1995) <doi:10.1093/biomet/82.3.561> subsampling-based cross-validation method, the Politis and White (2004) <doi:10.1081/ETC-120028836> Spectral Density Plug-in method, including the Patton, Politis, and White (2009) <doi:10.1080/07474930802459016> correction, and the Lahiri, Furukawa, and Lee (2007) <doi:10.1016/j.stamet.2006.08.002> nonparametric plug-in method, with a corresponding set of S3 plot methods. |
Authors: | Alec Stashevsky [aut, cre] |
Maintainer: | Alec Stashevsky <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2.2.9000 |
Built: | 2025-03-09 10:21:32 UTC |
Source: | https://github.com/alec-stashevsky/blocklength |
Perform the Hall, Horowitz, and Jing (1995) "HHJ" cross-validation algorithm to select the optimal block-length for a bootstrap on dependent data (block-bootstrap). Dependent data such as stationary time series are suitable for usage with the HHJ algorithm.
hhj( series, nb = 100L, n_iter = 10L, pilot_block_length = NULL, sub_sample = NULL, k = "two-sided", bofb = 1L, search_grid = NULL, grid_step = c(1L, 1L), cl = NULL, verbose = TRUE, plots = TRUE )
hhj( series, nb = 100L, n_iter = 10L, pilot_block_length = NULL, sub_sample = NULL, k = "two-sided", bofb = 1L, search_grid = NULL, grid_step = c(1L, 1L), cl = NULL, verbose = TRUE, plots = TRUE )
series |
a numeric vector or time series giving the original data for which to find the optimal block-length for. |
nb |
an integer value, number of bootstrapped series to compute. |
n_iter |
an integer value, maximum number of iterations for the HHJ algorithm to compute. |
pilot_block_length |
a numeric value, the block-length ( |
sub_sample |
a numeric value, the length of each overlapping
subsample, |
k |
a character string, either |
bofb |
a numeric value, length of the basic blocks in the
block-of-blocks bootstrap, see |
search_grid |
a numeric value, the range of solutions around |
grid_step |
a numeric value or vector of at most length 2, the number of
steps to increment over the subsample block-lengths when evaluating the
|
cl |
a cluster object, created by package parallel,
doParallel, or snow. If |
verbose |
a logical value, if set to |
plots |
a logical value, if set to |
The HHJ algorithm is computationally intensive as it relies on a
cross-validation process using a type of subsampling to estimate the mean
squared error () incurred by the bootstrap at various block-lengths.
Under-the-hood, hhj()
makes use of tsbootstrap
,
see Trapletti and Hornik (2020), to perform the moving block-bootstrap
(or the block-of-blocks bootstrap by setting bofb > 1
) according
to Kunsch (1989).
an object of class 'hhj'
Adrian Trapletti and Kurt Hornik (2020). tseries: Time Series Analysis and Computational Finance. R package version 0.10-48.
Kunsch, H. (1989) The Jackknife and the Bootstrap for General Stationary Observations. The Annals of Statistics, 17(3), 1217-1241. Retrieved February 16, 2021, from doi:10.1214/aos/1176347265
Peter Hall, Joel L. Horowitz, Bing-Yi Jing, On blocking rules for the bootstrap with dependent data, Biometrika, Volume 82, Issue 3, September 1995, Pages 561-574, DOI: doi:10.1093/biomet/82.3.561
# Generate AR(1) time series sim <- stats::arima.sim(list(order = c(1, 0, 0), ar = 0.5), n = 500, innov = rnorm(500)) # Calculate optimal block length for series hhj(sim, sub_sample = 10) # Use parallel computing library(parallel) # Make cluster object with 2 cores cl <- makeCluster(2) # Calculate optimal block length for series hhj(sim, cl = cl)
# Generate AR(1) time series sim <- stats::arima.sim(list(order = c(1, 0, 0), ar = 0.5), n = 500, innov = rnorm(500)) # Calculate optimal block length for series hhj(sim, sub_sample = 10) # Use parallel computing library(parallel) # Make cluster object with 2 cores cl <- makeCluster(2) # Calculate optimal block length for series hhj(sim, cl = cl)
This function implements the Nonparametric Plug-In (NPPI) algorithm, as proposed by Lahiri, Furukawa, and Lee (2007), to select the optimal block length for block bootstrap procedures. The NPPI method estimates the optimal block length by balancing bias and variance in block bootstrap estimators, particularly for time series and other dependent data structures. The function also leverages the Moving Block Bootstrap (MBB) method of (Kunsch, 1989) and the Moving Blocks Jackknifte (MBJ) of Liu and Singh (1992).
nppi( data, stat_function = mean, r = 1, a = 1, l = NULL, m = NULL, num_bootstrap = 1000, c_1 = 1L, epsilon = 1e-08, plots = TRUE )
nppi( data, stat_function = mean, r = 1, a = 1, l = NULL, m = NULL, num_bootstrap = 1000, c_1 = 1L, epsilon = 1e-08, plots = TRUE )
data |
A numeric vector, ts, or single-column data.frame representing the time series or dependent data. |
stat_function |
A function to compute the statistic of interest
(*e.g.*, mean, variance). The function should accept a numeric vector as input
and return a scalar value (default is |
r |
The rate parameter for the MSE expansion (default is 1). This parameter controls the convergence rate in the bias-variance trade-off. |
a |
The bias exponent (default is 1). Adjust this based on the theoretical properties of the statistic being bootstrapped. |
l |
Optional. The initial block size for bias estimation.
If not provided, it is set to |
m |
Optional. The number of blocks to delete in the
Jackknife-After-Bootstrap (JAB) variance estimation. If not provided,
it defaults to |
num_bootstrap |
The number of bootstrap replications for bias estimation (default is 1000). |
c_1 |
A tuning constant for initial block size calculation (default is 1). |
epsilon |
A small constant added to the variance to prevent division by
zero (default is |
plots |
A logical value indicating whether to plot the JAB diagnostic |
Jackknife-After-Bootstrap (JAB) variance estimation (Lahiri, 2002).
A object of class nppi
with the following components:
The estimated optimal block length for the block bootstrap procedure.
The estimated bias of the block bootstrap estimator.
The estimated variance of the block bootstrap estimator using the JAB method.
The point estimates of the statistic for each deletion block in the JAB variance estimation. Used for diagnostic plots
The pseudo-values of each JAB point value.
The initial block size used for bias estimation.
The number of blocks to delete in the JAB variance estimation.
Efron, B. (1992), 'Jackknife-after-bootstrap standard errors and influence functions (with discussion)', Journal of Royal Statistical Society, Series B 54, 83-111.
Kunsch, H. (1989) The Jackknife and the Bootstrap for General Stationary Observations. The Annals of Statistics, 17(3), 1217-1241. Retrieved February 16, 2021, from doi:10.1214/aos/1176347265
Lahiri, S. N., Furukawa, K., & Lee, Y.-D. (2007). A nonparametric plug-in rule for selecting optimal block lengths for Block Bootstrap Methods. Statistical Methodology, 4(3), 292-321. DOI: doi:10.1016/j.stamet.2006.08.002
Lahiri, S. N. (2003). 7.4 A Nonparametric Plug-in Method. In Resampling methods for dependent data (pp. 186-197). Springer.
Liu, R. Y. and Singh, K. (1992), Moving blocks jackknife and bootstrap capture weak dependence, in R. Lepage and L. Billard, eds, 'Exploring the Limits of the Bootstrap', Wiley, New York, pp. 225-248.
# Generate AR(1) time series set.seed(32) sim <- stats::arima.sim(list(order = c(1, 0, 0), ar = 0.5), n = 500, innov = rnorm(500)) # Estimate the optimal block length for the sample mean result <- nppi(data = sim, stat_function = mean, num_bootstrap = 500, m = 2) print(result$optimal_block_length) # Use S3 method to plot JAB diagnostic plot(result)
# Generate AR(1) time series set.seed(32) sim <- stats::arima.sim(list(order = c(1, 0, 0), ar = 0.5), n = 500, innov = rnorm(500)) # Estimate the optimal block length for the sample mean result <- nppi(data = sim, stat_function = mean, num_bootstrap = 500, m = 2) print(result$optimal_block_length) # Use S3 method to plot JAB diagnostic plot(result)
S3 Method for objects of class 'hhj'
## S3 method for class 'hhj' plot(x, iter = NULL, ...)
## S3 method for class 'hhj' plot(x, iter = NULL, ...)
x |
an object of class 'hhj' |
iter |
a vector of |
... |
Arguments passed on to
|
No return value, called for side effects
# Generate AR(1) time series sim <- stats::arima.sim(list(order = c(1, 0, 0), ar = 0.5), n = 500, innov = rnorm(500)) # Generate 'hhj' class object of optimal block length for series hhj <- hhj(sim, sub_sample = 10) ## S3 method for class 'hhj' plot(hhj)
# Generate AR(1) time series sim <- stats::arima.sim(list(order = c(1, 0, 0), ar = 0.5), n = 500, innov = rnorm(500)) # Generate 'hhj' class object of optimal block length for series hhj <- hhj(sim, sub_sample = 10) ## S3 method for class 'hhj' plot(hhj)
S3 Method for objects of class 'nppi' This function visualizes the JAB point estimates across deletion blocks indices used to estimate variance of the NPPI algorithm.
## S3 method for class 'nppi' plot(x, ...)
## S3 method for class 'nppi' plot(x, ...)
x |
An object of class |
... |
Arguments passed on to
|
No return value, called for side effects
# Generate AR(1) time series set.seed(32) sim <- stats::arima.sim(list(order = c(1, 0, 0), ar = 0.5), n = 500, innov = rnorm(500)) # Estimate the optimal block length for the sample mean result <- nppi(data = sim, stat_function = mean, num_bootstrap = 500, m = 2) # Use s3 method plot(result)
# Generate AR(1) time series set.seed(32) sim <- stats::arima.sim(list(order = c(1, 0, 0), ar = 0.5), n = 500, innov = rnorm(500)) # Estimate the optimal block length for the sample mean result <- nppi(data = sim, stat_function = mean, num_bootstrap = 500, m = 2) # Use s3 method plot(result)
Correlation Implied Hypothesis TestS3 Method for objects of class 'pwsd'
See ?plot.acf
of the stats package for more customization
options on the correlogram, from which plot.pwsd
is based
## S3 method for class 'pwsd' plot(x, c = NULL, main = NULL, ylim = NULL, ...)
## S3 method for class 'pwsd' plot(x, c = NULL, main = NULL, ylim = NULL, ...)
x |
an of object of class 'pwsd' or 'acf' |
c |
a numeric value, the constant which acts as the significance level
for the implied hypothesis test. Defaults to |
main |
an overall title for the plot, if no string is supplied a default
title will be populated. See |
ylim |
a numeric of length 2 giving the y-axis limits for the plot |
... |
Arguments passed on to
|
No return value, called for side effects
# Use S3 Method # Generate AR(1) time series sim <- stats::arima.sim(list(order = c(1, 0, 0), ar = 0.5), n = 500, innov = rnorm(500)) b <- pwsd(sim, round = TRUE, correlogram = FALSE) plot(b)
# Use S3 Method # Generate AR(1) time series sim <- stats::arima.sim(list(order = c(1, 0, 0), ar = 0.5), n = 500, innov = rnorm(500)) b <- pwsd(sim, round = TRUE, correlogram = FALSE) plot(b)
Run the Automatic Block-Length selection method proposed by Politis and White
(2004) and corrected in Patton, Politis, and White (2009). The method is
based on spectral density estimation via flat-top lag windows of Politis and
Romano (1995). This code was adapted from b.star
to add
functionality and include correlogram support including an S3 method,
see Hayfield and Racine (2008).
pwsd( data, K_N = NULL, M_max = NULL, m_hat = NULL, b_max = NULL, c = NULL, round = FALSE, correlogram = TRUE )
pwsd( data, K_N = NULL, M_max = NULL, m_hat = NULL, b_max = NULL, c = NULL, round = FALSE, correlogram = TRUE )
data |
an |
K_N |
an integer value, the maximum lags for the auto-correlation,
|
M_max |
an integer value, the upper-bound for the optimal number of lags,
|
m_hat |
an integer value, if set to |
b_max |
a numeric value, the upper-bound for the optimal block-length.
Defaults to |
c |
a numeric value, the constant which acts as the significance level
for the implied hypothesis test. Defaults to |
round |
a logical value, if set to |
correlogram |
a logical value, if set to |
an object of class 'pwsd'
Andrew Patton, Dimitris N. Politis & Halbert White (2009) Correction to "Automatic Block-Length Selection for the Dependent Bootstrap" by D. Politis and H. White, Econometric Review, 28:4, 372-375, DOI: doi:10.1080/07474930802459016
Dimitris N. Politis & Halbert White (2004) Automatic Block-Length Selection for the Dependent Bootstrap, Econometric Reviews, 23:1, 53-70, DOI: doi:10.1081/ETC-120028836
Politis, D.N. and Romano, J.P. (1995), Bias-Corrected Nonparametric Spectral Estimation. Journal of Time Series Analysis, 16: 67-103, DOI: doi:10.1111/j.1467-9892.1995.tb00223.x
Tristen Hayfield and Jeffrey S. Racine (2008). Nonparametric Econometrics: The np Package. Journal of Statistical Software 27(5). DOI: doi:10.18637/jss.v027.i05
# Generate AR(1) time series sim <- stats::arima.sim(list(order = c(1, 0, 0), ar = 0.5), n = 500, innov = rnorm(500)) # Calculate optimal block length for series pwsd(sim, round = TRUE) # Use S3 Method b <- pwsd(sim, round = TRUE, correlogram = FALSE) plot(b)
# Generate AR(1) time series sim <- stats::arima.sim(list(order = c(1, 0, 0), ar = 0.5), n = 500, innov = rnorm(500)) # Calculate optimal block length for series pwsd(sim, round = TRUE) # Use S3 Method b <- pwsd(sim, round = TRUE, correlogram = FALSE) plot(b)