Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
2,230
result(s) for
"Cross-validation"
Sort by:
Cross-Validation Visualized: A Narrative Guide to Advanced Methods
2024
This study delves into the multifaceted nature of cross-validation (CV) techniques in machine learning model evaluation and selection, underscoring the challenge of choosing the most appropriate method due to the plethora of available variants. It aims to clarify and standardize terminology such as sets, groups, folds, and samples pivotal in the CV domain, and introduces an exhaustive compilation of advanced CV methods like leave-one-out, leave-p-out, Monte Carlo, grouped, stratified, and time-split CV within a hold-out CV framework. Through graphical representations, the paper enhances the comprehension of these methodologies, facilitating more informed decision making for practitioners. It further explores the synergy between different CV strategies and advocates for a unified approach to reporting model performance by consolidating essential metrics. The paper culminates in a comprehensive overview of the CV techniques discussed, illustrated with practical examples, offering valuable insights for both novice and experienced researchers in the field.
Journal Article
Objective Quantification of In-Hospital Patient Mobilization after Cardiac Surgery Using Accelerometers: Selection, Use, and Analysis
by
Veltink, Peter H.
,
van Delden, Robby W.
,
Halfwerk, Frank R.
in
Accelerometry
,
activity classification
,
Cardiac Surgical Procedures
2021
Cardiac surgery patients infrequently mobilize during their hospital stay. It is unclear for patients why mobilization is important, and exact progress of mobilization activities is not available. The aim of this study was to select and evaluate accelerometers for objective qualification of in-hospital mobilization after cardiac surgery. Six static and dynamic patient activities were defined to measure patient mobilization during the postoperative hospital stay. Device requirements were formulated, and the available devices reviewed. A triaxial accelerometer (AX3, Axivity) was selected for a clinical pilot in a heart surgery ward and placed on both the upper arm and upper leg. An artificial neural network algorithm was applied to classify lying in bed, sitting in a chair, standing, walking, cycling on an exercise bike, and walking the stairs. The primary endpoint was the daily amount of each activity performed between 7 a.m. and 11 p.m. The secondary endpoints were length of intensive care unit stay and surgical ward stay. A subgroup analysis for male and female patients was planned. In total, 29 patients were classified after cardiac surgery with an intensive care unit stay of 1 (1 to 2) night and surgical ward stay of 5 (3 to 6) nights. Patients spent 41 (20 to 62) min less time in bed for each consecutive hospital day, as determined by a mixed-model analysis (p < 0.001). Standing, walking, and walking the stairs increased during the hospital stay. No differences between men (n = 22) and women (n = 7) were observed for all endpoints in this study. The approach presented in this study is applicable for measuring all six activities and for monitoring postoperative recovery of cardiac surgery patients. A next step is to provide feedback to patients and healthcare professionals, to speed up recovery.
Journal Article
Fast stable direct fitting and smoothness selection for generalized additive models
2008
Existing computationally efficient methods for penalized likelihood generalized additive model fitting employ iterative smoothness selection on working linear models (or working mixed models). Such schemes fail to converge for a non-negligible proportion of models, with failure being particularly frequent in the presence of concurvity. If smoothness selection is performed by optimizing 'whole model' criteria these problems disappear, but until now attempts to do this have employed finite-difference-based optimization schemes which are computationally inefficient and can suffer from false convergence. The paper develops the first computationally efficient method for direct generalized additive model smoothness selection. It is highly stable, but by careful structuring achieves a computational efficiency that leads, in simulations, to lower mean computation times than the schemes that are based on working model smoothness selection. The method also offers a reliable way of fitting generalized additive mixed models.
Journal Article
Cross‐validated permutation feature importance considering correlation between features
2022
In molecular design, material design, process design, and process control, it is important not only to construct a model with high predictive ability between explanatory features x and objective features y using a dataset but also to interpret the constructed model. An index of feature importance in x is permutation feature importance (PFI), which can be combined with any regressors and classifiers. However, the PFI becomes unstable when the number of samples is low because it is necessary to divide a dataset into training and validation data when calculating it. Additionally, when there are strongly correlated features in x, the PFI of these features is estimated to be low. Hence, a cross‐validated PFI (CVPFI) method is proposed. CVPFI can be calculated stably, even with a small number of samples, because model construction and feature evaluation are repeated based on cross‐validation. Furthermore, by considering the absolute correlation coefficients between the features, the feature importance can be evaluated appropriately even when there are strongly correlated features in x. Case studies using numerical simulation data and actual compound data showed that the feature importance can be evaluated appropriately using CVPFI compared to PFI. This is possible when the number of samples is low, when linear and nonlinear relationships are mixed between x and y when there are strong correlations between features in x, and when quantised and biased features exist in x. Python codes for CVPFI are available at https://github.com/hkaneko1985/dcekit.
Journal Article
Predictive model assessment and selection in composite-based modeling using PLS-SEM: extensions and guidelines for using CVPAT
by
Ringle, Christian M.
,
Liengaard, Benjamin D.
,
Hair, Joseph F.
in
Ability tests
,
Accuracy
,
Expected values
2023
Purpose
Researchers often stress the predictive goals of their partial least squares structural equation modeling (PLS-SEM) analyses. However, the method has long lacked a statistical test to compare different models in terms of their predictive accuracy and to establish whether a proposed model offers a significantly better out-of-sample predictive accuracy than a naïve benchmark. This paper aims to address this methodological research gap in predictive model assessment and selection in composite-based modeling.
Design/methodology/approach
Recent research has proposed the cross-validated predictive ability test (CVPAT) to compare theoretically established models. This paper proposes several extensions that broaden the scope of CVPAT and explains the key choices researchers must make when using them. A popular marketing model is used to illustrate the CVPAT extensions’ use and to make recommendations for the interpretation and benchmarking of the results.
Findings
This research asserts that prediction-oriented model assessments and comparisons are essential for theory development and validation. It recommends that researchers routinely consider the application of CVPAT and its extensions when analyzing their theoretical models.
Research limitations/implications
The findings offer several avenues for future research to extend and strengthen prediction-oriented model assessment and comparison in PLS-SEM.
Practical implications
Guidelines are provided for applying CVPAT extensions and reporting the results to help researchers substantiate their models’ predictive capabilities.
Originality/value
This research contributes to strengthening the predictive model validation practice in PLS-SEM, which is essential to derive managerial implications that are typically predictive in nature.
Journal Article
Hybrid breeding of rice via genomic selection
by
Li, Ruidong
,
Ali, Jauhar
,
Cui, Yanru
in
Agronomy
,
best linear unbiased prediction
,
biotechnology
2020
Summary
Hybrid breeding is the main strategy for improving productivity in many crops, especially in rice and maize. Genomic hybrid breeding is a technology that uses whole‐genome markers to predict future hybrids. Predicted superior hybrids are then field evaluated and released as new hybrid cultivars after their superior performances are confirmed. This will increase the opportunity of selecting true superior hybrids with minimum costs. Here, we used genomic best linear unbiased prediction to perform hybrid performance prediction using an existing rice population of 1495 hybrids. Replicated 10‐fold cross‐validations showed that the prediction abilities on ten agronomic traits ranged from 0.35 to 0.92. Using the 1495 rice hybrids as a training sample, we predicted six agronomic traits of 100 hybrids derived from half diallel crosses involving 21 parents that are different from the parents of the hybrids in the training sample. The prediction abilities were relatively high, varying from 0.54 (yield) to 0.92 (grain length). We concluded that the current population of 1495 hybrids can be used to predict hybrids from seemingly unrelated parents. Eventually, we used this training population to predict all potential hybrids of cytoplasm male sterile lines from 3000 rice varieties from the 3K Rice Genome Project. Using a breeding index combining 10 traits, we identified the top and bottom 200 predicted hybrids. SNP genotypes of the training population and parameters estimated from this training population are available for general uses and further validation in genomic hybrid prediction of all potential hybrids generated from all varieties of rice.
Journal Article
PREDICTIVE INFERENCE WITH THE JACKKNIFE
by
Ramdas, Aaditya
,
Candès, Emmanuel J.
,
Barber, Rina Foygel
in
Algorithms
,
Confidence intervals
,
Data points
2021
This paper introduces the jackknife+, which is a novel method for constructing predictive confidence intervals. Whereas the jackknife outputs an interval centered at the predicted response of a test point, with the width of the interval determined by the quantiles of leave-one-out residuals, the jackknife+ also uses the leave-one-out predictions at the test point to account for the variability in the fitted regression function. Assuming exchangeable training samples, we prove that this crucial modification permits rigorous coverage guarantees regardless of the distribution of the data points, for any algorithm that treats the training points symmetrically. Such guarantees are not possible for the original jackknife and we demonstrate examples where the coverage rate may actually vanish. Our theoretical and empirical analysis reveals that the jackknife and the jackknife+ intervals achieve nearly exact coverage and have similar lengths whenever the fitting algorithm obeys some form of stability. Further, we extend the jackknife+ to K-fold cross validation and similarly establish rigorous coverage properties. Our methods are related to cross-conformal prediction proposed by Vovk (Ann. Math. Artif. Intell. 74 (2015) 9–28) and we discuss connections.
Journal Article
Model selection using information criteria, but is the \best\ model any good?
by
Thomson, James R.
,
Duncan, Richard P.
,
Mac Nally, Ralph
in
Adequacy
,
applied ecology
,
COMMENTARY
2018
1. Information criteria (ICs) are used widely for data summary and model building in ecology, especially in applied ecology and wildlife management. Although ICs are useful for distinguishing among rival candidate models, ICs do not necessarily indicate whether the \"best\" model (or a model-averaged version) is a good representation of the data or whether the model has useful \"explanatory\" or \"predictive\" ability. 2. As editors and reviewers, we have seen many submissions that did not evaluate whether the nominal \"best\" model(s) found using IC is a useful model in the above sense. 3. We scrutinized six leading ecological journals for papers that used IC to models. More than half of papers using IC for model comparison did not evaluate the adequacy of the best model(s) in either \"explaining\" or \"prdicting\" the data. 4. Synthesis and applications. Authors need to evaluate the adequacy of the model identified as the \"best\" model by using information criteria methods to provide convincing evidence to readers and users that inferences from the best models are useful and reliable.
Journal Article
Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models
2011
Recent work by Reiss and Ogden provides a theoretical basis for sometimes preferring restricted maximum likelihood (REML) to generalized cross-validation (GCV) for smoothing parameter selection in semiparametric regression. However, existing REML or marginal likelihood (ML) based methods for semiparametric generalized linear models (GLMs) use iterative REML or ML estimation of the smoothing parameters of working linear approximations to the GLM. Such indirect schemes need not converge and fail to do so in a non-negligible proportion of practical analyses. By contrast, very reliable prediction error criteria smoothing parameter selection methods are available, based on direct optimization of GCV, or related criteria, for the GLM itself. Since such methods directly optimize properly defined functions of the smoothing parameters, they have much more reliable convergence properties. The paper develops the first such method for REML or ML estimation of smoothing parameters. A Laplace approximation is used to obtain an approximate REML or ML for any GLM, which is suitable for efficient direct optimization. This REML or ML criterion requires that Newton-Raphson iteration, rather than Fisher scoring, be used for GLM fitting, and a computationally stable approach to this is proposed. The REML or ML criterion itself is optimized by a Newton method, with the derivatives required obtained by a mixture of implicit differentiation and direct methods. The method will cope with numerical rank deficiency in the fitted model and in fact provides a slight improvement in numerical robustness on the earlier method of Wood for prediction error criteria based smoothness selection. Simulation results suggest that the new REML and ML methods offer some improvement in mean-square error performance relative to GCV or Akaike's information criterion in most cases, without the small number of severe undersmoothing failures to which Akaike's information criterion and GCV are prone. This is achieved at the same computational cost as GCV or Akaike's information criterion. The new approach also eliminates the convergence failures of previous REML- or ML-based approaches for penalized GLMs and usually has lower computational cost than these alternatives. Example applications are presented in adaptive smoothing, scalar on function regression and generalized additive model selection.
Journal Article
Model selection and assessment for multi-species occupancy models
by
Broms, Kristin M.
,
Fitzpatrick, Ryan M.
,
Hooten, Mevin B.
in
Animals
,
Bayes Theorem
,
Bayesian analysis
2016
While multi-species occupancy models (MSOMs) are emerging as a popular method for analyzing biodiversity data, formal checking and validation approaches for this class of models have lagged behind. Concurrent with the rise in application of MSOMs among ecologists, a quiet regime shift is occurring in Bayesian statistics where predictive model comparison approaches are experiencing a resurgence. Unlike single-species occupancy models that use integrated likelihoods, MSOMs are usually couched in a Bayesian framework and contain multiple levels. Standard model checking and selection methods are often unreliable in this setting and there is only limited guidance in the ecological literature for this class of models. We examined several different contemporary Bayesian hierarchical approaches for checking and validating MSOMs and applied these methods to a freshwater aquatic study system in Colorado, USA, to better understand the diversity and distributions of plains fishes. Our findings indicated distinct differences among model selection approaches, with cross-validation techniques performing the best in terms of prediction.
Journal Article