Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
5,019
result(s) for
"Bootstrap method"
Sort by:
BOOTSTRAP LONG MEMORY PROCESSES IN THE FREQUENCY DOMAIN
2021
The aim of the paper is to describe a bootstrap, contrary to the sieve bootstrap, valid under either long memory (LM) or short memory (SM) dependence. One of the reasons of the failure of the sieve bootstrap in our context is that under LM dependence, the sieve bootstrap may not be able to capture the true covariance structure of the original data. We also describe and examine the validity of the bootstrap scheme for the least squares estimator of the parameter in a regression model and for model specification. The motivation for the latter example comes from the observation that the asymptotic distribution of the test is intractable.
Journal Article
Ensemble Approach for Financial Time Series Modeling
2026
This study provides a comprehensive evaluation of bagging ensemble models for financial time series (FTS) classification and addresses a gap in the literature regarding how bootstrap methods, ensemble sizes, voting mechanisms, and loss functions jointly influence model performance. The analysis evaluates decision tree (DT), logistic regression (LR), and multi-layer perceptron (MLP) ensemble models modified by six time series bootstrap methods, five ensemble sizes, and three voting mechanisms across six FTS data sets. The study also examines the influence of entropy- and profit-based loss functions within particle swarm (PSO) and quantum-inspired particle swarm (QPSO) optimization for weighted voting. The results show that LR-based ensembles provide the strongest overall performance and outperform ARIMA, DT, LR, MLP, and LSTM baseline models on both accuracy and profit metrics. Bootstrap effects are model specific. DT and MLP ensembles perform best under the Tukey bootstrap, while LR ensembles achieve strong results under the block bootstrap, the sub-sample bootstrap method, and the Tukey method, and remain the strongest performers across all bootstrap configurations. Optimized voting mechanisms yield clear improvements over equal-weight majority voting, with the profit loss function producing the most consistent gains. The findings also indicate that FTS classification problems exhibit an optimal range of ensemble sizes, as larger ensembles do not always improve performance. The study contributes a systematic assessment of ensemble design choices for FTS classification and highlights the importance of jointly considering bootstrap diversity, ensemble size, and voting strategy when developing ensemble models for financial applications.
Journal Article
More reliable inference for the dissimilarity index of segregation
by
Windmeijer, Frank
,
Allen, Rebecca
,
Burgess, Simon
in
Adjustment
,
Bootstrap mechanism
,
Bootstrap method
2015
The most widely used measure of segregation is the so-called dissimilarity index. It is now well understood that this measure also reflects randomness in the allocation of individuals to units (i.e. it measures deviations from evenness, not deviations from randomness). This leads to potentially large values of the segregation index when unit sizes and/or minority proportions are small, even if there is no underlying systematic segregation. Our response to this is to produce adjustments to the index, based on an underlying statistical model. We specify the assignment problem in a very general way, with differences in conditional assignment probabilities underlying the resulting segregation. From this, we derive a likelihood ratio test for the presence of any systematic segregation, and bias adjustments to the dissimilarity index. We further develop the asymptotic distribution theory for testing hypotheses concerning the magnitude of the segregation index and show that the use of bootstrap methods can improve the size and power properties of test procedures considerably. We illustrate these methods by comparing dissimilarity indices across school districts in England to measure social segregation.
Journal Article
Standard Errors of IRT Parameter Scale Transformation Coefficients: Comparison of Bootstrap Method, Delta Method, and Multiple Imputation Method
2019
The present study evaluated the multiple imputation method, a procedure that is similar to the one suggested by Li and Lissitz (2004), and compared the performance of this method with that of the bootstrap method and the delta method in obtaining the standard errors for the estimates of the parameter scale transformation coefficients in item response theory (IRT) equating in the context of the common-item nonequivalent groups design. Two different estimation procedures for the variancecovariance matrix of the IRT item parameter estimates, which were used in both the delta method and the multiple imputation method, were considered: empirical cross-product (XPD) and supplemented expectation maximization (SEM). The results of the analyses with simulated and real data indicate that the multiple imputation method generally produced very similar results to the bootstrap method and the delta method in most of the conditions. The differences between the estimated standard errors obtained by the methods using the XPD matrices and the SEM matrices were very small when the sample size was reasonably large. When the sample size was small, the methods using the XPD matrices appeared to yield slight upward bias for the standard errors of the IRT parameter scale transformation coefficients.
Journal Article
Empirical simulation of internal validation methods for prediction models: comparing k-fold cross-validation with bootstrap-based optimism correction
by
Yan, Ruohua
,
Peng, Xiaoxia
,
Liu, Xiaohang
in
Acute Kidney Injury - epidemiology
,
Algorithms
,
Bias
2026
To systematically evaluate the performance of k-fold cross-validation and bootstrap-based optimism correction methods for internal validation of statistical and machine learning models.
A total of 239,415 inpatients were extracted from an open access database named Medical Information Mart for Intensive Care IV, of which 39,145 were randomly sampled as a predefined reference dataset. Among the remaining simulation dataset with 200,000 inpatients, training sets with sample sizes ranging from 595 to 5946 were randomly selected, and multiple prediction models were developed in each training set using various modeling strategies, including logistic regression, least absolute shrinkage and selection operator regression, Naive Bayes, Support Vector Machine, K-Nearest Neighbors, Light Gradient Boosting Machine, and Random Forest. The dependent variable of the model was acute kidney injury (AKI), a binary outcome with an incidence of 18.5%, and the independent variables included 22 common predictors of AKI. For each model, 2-fold, 5-fold, and 10-fold cross-validation were used for internal validation to calculate area under the receiver-operating characteristic curve (AUC), which is a common metric for quantifying the overall ability of a model to discriminate between positive or negative classifications. In addition, the Harrell, .632, and .632+ AUC estimators were calculated for internal validation based on bootstrapping. The above simulation process was repeated 1000 times to obtain 1000 estimates of AUC for each internal validation method of each model. The model performance was simultaneously evaluated in the reference dataset to obtain an empirical AUC (analogous to the “gold standard”). Then, by comparing the 1000 AUC estimates with the empirical AUC, the accuracy of internal validation methods for different models was assessed.
For parametric models, the .632+ estimator provided the most accurate estimates of AUC, followed by 10-fold cross-validation with only slight bias. In contrast, for nonparametric models, all bootstrap-based optimism correction methods significantly overestimated AUC, and the overestimation was not reduced by increasing the sample size. Most strikingly, 10-fold cross-validation demonstrated stable and good performance across all scenarios considered, regardless of the modeling strategy or sample size.
The performance of bootstrap-based optimism correction methods can be affected by model complexity, although the .632+ estimator performs best in parameter models based on small-sample training. In comparison, 10-fold cross-validation is more robust and easier to implement. Therefore, it is recommended to prioritize 10-fold cross-validation as the internal validation method for prediction models.
With the exponential growth of clinical prediction models, the methods for conducting internal validation of these models remain controversial. Both k-fold cross-validation and bootstrap-based optimism correction methods are recommended by guidance papers. However, the issue of whether they are applicable to all modeling strategies, especially machine learning algorithms, still lacks evidence. This study simulated various sample size scenarios based on real-world clinical data, and developed AKI prediction models based on parametric and nonparametric modeling strategies. Then, internal validation was performed for each model using different methods. The results showed that bootstrap-based optimism correction methods were suitable for parametric models. However, as the model complexity increased, the bias of bootstrap-based optimism correction methods increased accordingly. In contrast, 10-fold cross-validation performed well in all scenarios, regardless of the modeling strategy or sample size. Therefore, 10-fold cross-validation is recommended as a preferred method for internal validation of prediction models.
[Display omitted]
Key findings•Through a comprehensive comparison of internal validation methods in statistical and machine learning models, this study found that cross-validation, especially the 10-fold cross-validation, demonstrated stable and good performance across a wide range of scenarios.
What this adds to what is known?•Although bootstrap-based optimism correction methods provide accurate estimates of model performance in parametric models, they exhibit significant overestimation in nonparametric models.
What is the implication and what should change now?•Given that cross-validation is more robust and computationally less costly to implement than the bootstrap-based methods, it should be recommended as a preferred method for internal validation of prediction models.
Journal Article
Testing and Estimating Shape-Constrained Nonparametric Density and Regression in the Presence of Measurement Error
by
Carroll, Raymond J.
,
Delaigle, Aurore
,
Hall, Peter
in
Applications
,
Bootstrap method
,
Bootstrap methods
2011
In many applications we can expect that, or are interested to know if, a density function or a regression curve satisfies some specific shape constraints. For example, when the explanatory variable, X, represents the value taken by a treatment or dosage, the conditional mean of the response, Y, is often anticipated to be a monotone function of X. Indeed, if this regression mean is not monotone (in the appropriate direction) then the medical or commercial value of the treatment is likely to be significantly curtailed, at least for values of X that lie beyond the point at which monotonicity fails. In the case of a density, common shape constraints include log-concavity and unimodality. If we can correctly guess the shape of a curve, then nonparametric estimators can be improved by taking this information into account. Addressing such problems requires a method for testing the hypothesis that the curve of interest satisfies a shape constraint, and, if the conclusion of the test is positive, a technique for estimating the curve subject to the constraint. Nonparametric methodology for solving these problems already exists, but only in cases where the covariates are observed precisely. However in many problems, data can only be observed with measurement errors, and the methods employed in the error-free case typically do not carry over to this error context. In this article we develop a novel approach to hypothesis testing and function estimation under shape constraints, which is valid in the context of measurement errors. Our method is based on tilting an estimator of the density or the regression mean until it satisfies the shape constraint, and we take as our test statistic the distance through which it is tilted. Bootstrap methods are used to calibrate the test. The constrained curve estimators that we develop are also based on tilting, and in that context our work has points of contact with methodology in the error-free case.
Journal Article
INFERENCE ON COUNTERFACTUAL DISTRIBUTIONS
by
Melly, Blaise
,
Fernández-Val, Iván
,
Chernozhukov, Victor
in
Analytical estimating
,
Bootstrap mechanism
,
Bootstrap method
2013
Counterfactual distributions are important ingredients for policy analysis and decomposition analysis in empirical economics. In this article, we develop modeling and inference tools for counterfactual distributions based on regression methods. The counterfactual scenarios that we consider consist of ceteris paribus changes in either the distribution of covariates related to the outcome of interest or the conditional distribution of the outcome given covariates. For either of these scenarios, we derive joint functional central limit theorems and bootstrap validity results for regression-based estimators of the status quo and counterfactual outcome distributions. These results allow us to construct simultaneous confidence sets for function-valued effects of the counterfactual changes, including the effects on the entire distribution and quantile functions of the outcome as well as on related functionals. These confidence sets can be used to test functional hypotheses such as no-effect, positive effect, or stochastic dominance. Our theory applies to general counterfactual changes and covers the main regression methods including classical, quantile, duration, and distribution regressions. We illustrate the results with an empirical application to wage decompositions using data for the United States. As a part of developing the main results, we introduce distribution regression as a comprehensive and flexible tool for modeling and estimating the entire conditional distribution. We show that distribution regression encompasses the Cox duration regression and represents a useful alternative to quantile regression. We establish functional central limit theorems and bootstrap validity results for the empirical distribution regression process and various related functionals.
Journal Article
Is there really any Contagion among Major Equity and Securitized Real Estate Markets? Analysis from a New Perspective
2018
This study examines contagion across general equity and securitized real estate markets of China, Hong Kong and the US during the Chinese financial crisis. This is the first study to combine the case-resampling bootstrap method with the coskewness and cokurtosis test. Thus the new method works well on data with a non-normal distribution or non-constant variance. Additional channels of contagion may also be detected to reflect a more precise pattern of contagion. In contrast to Hatemi-J and Hacker, Applied Financial Economics Letters, 1(6), 343-347 (2005)‘s result, we find that the case-resampling bootstrap method diminishes the overall effect of contagion. In particular, no additional channels of contagion can be found when the case-resampling bootstrap method is applied on the coskewness test, but when the case-resampling bootstrap method is applied on the cokurtosis test, additional channels of contagion are detected. Furthermore, the overall effect of contagion is greater on the general equity markets than on the securitized real estate markets. This study has useful implications to investors, regulators and policy makers.
Journal Article
New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0
2010
PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696–704), PhyML has been widely used (>2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira—Hasegawa—like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from http://www.atgc-montpellier.fr/phyml/.
Journal Article
CENTRAL LIMIT THEOREMS AND BOOTSTRAP IN HIGH DIMENSIONS
by
Kato, Kengo
,
Chetverikov, Denis
,
Chernozhukov, Victor
in
Approximation
,
Bootstrap method
,
Convex analysis
2017
This paper derives central limit and bootstrap theorems for probabilities that sums of centered high-dimensional random vectors hit hyperrectangles and sparsely convex sets. Specifically, we derive Gaussian and bootstrap approximations for probabilities $\\mathrm{P}({\\mathrm{n}}^{-1/2}{\\mathrm{\\Sigma }}_{\\mathrm{i}=1}^{\\mathrm{n}}{\\mathrm{X}}_{\\mathrm{i}}\\in \\mathrm{A})$ where X1,..., Xn are independent random vectors in ℝp and A is a hyperrectangle, or more generally, a sparsely convex set, and show that the approximation error converges to zero even if p = pn → ∞ as n → ∞ and p ≫ n; in particular, p can be as large as $\\mathrm{O}\\left({\\mathrm{e}}^{\\mathrm{C}{\\mathrm{n}}^{\\mathrm{c}}}\\right)$ for some constants c, C > 0. The result holds uniformly over all hyperrectangles, or more generally, sparsely convex sets, and does not require any restriction on the correlation structure among coordinates of Xi. Sparsely convex sets are sets that can be represented as intersections of many convex sets whose indicator functions depend only on a small subset of their arguments, with hyperrectangles being a special case.
Journal Article