Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
5,305
result(s) for
"cross validation"
Sort by:
Cross-Validation Visualized: A Narrative Guide to Advanced Methods
2024
This study delves into the multifaceted nature of cross-validation (CV) techniques in machine learning model evaluation and selection, underscoring the challenge of choosing the most appropriate method due to the plethora of available variants. It aims to clarify and standardize terminology such as sets, groups, folds, and samples pivotal in the CV domain, and introduces an exhaustive compilation of advanced CV methods like leave-one-out, leave-p-out, Monte Carlo, grouped, stratified, and time-split CV within a hold-out CV framework. Through graphical representations, the paper enhances the comprehension of these methodologies, facilitating more informed decision making for practitioners. It further explores the synergy between different CV strategies and advocates for a unified approach to reporting model performance by consolidating essential metrics. The paper culminates in a comprehensive overview of the CV techniques discussed, illustrated with practical examples, offering valuable insights for both novice and experienced researchers in the field.
Journal Article
Objective Quantification of In-Hospital Patient Mobilization after Cardiac Surgery Using Accelerometers: Selection, Use, and Analysis
by
Veltink, Peter H.
,
van Delden, Robby W.
,
Halfwerk, Frank R.
in
Accelerometry
,
activity classification
,
Cardiac Surgical Procedures
2021
Cardiac surgery patients infrequently mobilize during their hospital stay. It is unclear for patients why mobilization is important, and exact progress of mobilization activities is not available. The aim of this study was to select and evaluate accelerometers for objective qualification of in-hospital mobilization after cardiac surgery. Six static and dynamic patient activities were defined to measure patient mobilization during the postoperative hospital stay. Device requirements were formulated, and the available devices reviewed. A triaxial accelerometer (AX3, Axivity) was selected for a clinical pilot in a heart surgery ward and placed on both the upper arm and upper leg. An artificial neural network algorithm was applied to classify lying in bed, sitting in a chair, standing, walking, cycling on an exercise bike, and walking the stairs. The primary endpoint was the daily amount of each activity performed between 7 a.m. and 11 p.m. The secondary endpoints were length of intensive care unit stay and surgical ward stay. A subgroup analysis for male and female patients was planned. In total, 29 patients were classified after cardiac surgery with an intensive care unit stay of 1 (1 to 2) night and surgical ward stay of 5 (3 to 6) nights. Patients spent 41 (20 to 62) min less time in bed for each consecutive hospital day, as determined by a mixed-model analysis (p < 0.001). Standing, walking, and walking the stairs increased during the hospital stay. No differences between men (n = 22) and women (n = 7) were observed for all endpoints in this study. The approach presented in this study is applicable for measuring all six activities and for monitoring postoperative recovery of cardiac surgery patients. A next step is to provide feedback to patients and healthcare professionals, to speed up recovery.
Journal Article
Fast stable direct fitting and smoothness selection for generalized additive models
2008
Existing computationally efficient methods for penalized likelihood generalized additive model fitting employ iterative smoothness selection on working linear models (or working mixed models). Such schemes fail to converge for a non-negligible proportion of models, with failure being particularly frequent in the presence of concurvity. If smoothness selection is performed by optimizing 'whole model' criteria these problems disappear, but until now attempts to do this have employed finite-difference-based optimization schemes which are computationally inefficient and can suffer from false convergence. The paper develops the first computationally efficient method for direct generalized additive model smoothness selection. It is highly stable, but by careful structuring achieves a computational efficiency that leads, in simulations, to lower mean computation times than the schemes that are based on working model smoothness selection. The method also offers a reliable way of fitting generalized additive mixed models.
Journal Article
Cross‐validated permutation feature importance considering correlation between features
2022
In molecular design, material design, process design, and process control, it is important not only to construct a model with high predictive ability between explanatory features x and objective features y using a dataset but also to interpret the constructed model. An index of feature importance in x is permutation feature importance (PFI), which can be combined with any regressors and classifiers. However, the PFI becomes unstable when the number of samples is low because it is necessary to divide a dataset into training and validation data when calculating it. Additionally, when there are strongly correlated features in x, the PFI of these features is estimated to be low. Hence, a cross‐validated PFI (CVPFI) method is proposed. CVPFI can be calculated stably, even with a small number of samples, because model construction and feature evaluation are repeated based on cross‐validation. Furthermore, by considering the absolute correlation coefficients between the features, the feature importance can be evaluated appropriately even when there are strongly correlated features in x. Case studies using numerical simulation data and actual compound data showed that the feature importance can be evaluated appropriately using CVPFI compared to PFI. This is possible when the number of samples is low, when linear and nonlinear relationships are mixed between x and y when there are strong correlations between features in x, and when quantised and biased features exist in x. Python codes for CVPFI are available at https://github.com/hkaneko1985/dcekit.
Journal Article
Model selection using information criteria, but is the \best\ model any good?
by
Thomson, James R.
,
Duncan, Richard P.
,
Mac Nally, Ralph
in
Adequacy
,
applied ecology
,
COMMENTARY
2018
1. Information criteria (ICs) are used widely for data summary and model building in ecology, especially in applied ecology and wildlife management. Although ICs are useful for distinguishing among rival candidate models, ICs do not necessarily indicate whether the \"best\" model (or a model-averaged version) is a good representation of the data or whether the model has useful \"explanatory\" or \"predictive\" ability. 2. As editors and reviewers, we have seen many submissions that did not evaluate whether the nominal \"best\" model(s) found using IC is a useful model in the above sense. 3. We scrutinized six leading ecological journals for papers that used IC to models. More than half of papers using IC for model comparison did not evaluate the adequacy of the best model(s) in either \"explaining\" or \"prdicting\" the data. 4. Synthesis and applications. Authors need to evaluate the adequacy of the model identified as the \"best\" model by using information criteria methods to provide convincing evidence to readers and users that inferences from the best models are useful and reliable.
Journal Article
Hybrid breeding of rice via genomic selection
by
Li, Ruidong
,
Ali, Jauhar
,
Cui, Yanru
in
Agronomy
,
best linear unbiased prediction
,
biotechnology
2020
Summary Hybrid breeding is the main strategy for improving productivity in many crops, especially in rice and maize. Genomic hybrid breeding is a technology that uses whole‐genome markers to predict future hybrids. Predicted superior hybrids are then field evaluated and released as new hybrid cultivars after their superior performances are confirmed. This will increase the opportunity of selecting true superior hybrids with minimum costs. Here, we used genomic best linear unbiased prediction to perform hybrid performance prediction using an existing rice population of 1495 hybrids. Replicated 10‐fold cross‐validations showed that the prediction abilities on ten agronomic traits ranged from 0.35 to 0.92. Using the 1495 rice hybrids as a training sample, we predicted six agronomic traits of 100 hybrids derived from half diallel crosses involving 21 parents that are different from the parents of the hybrids in the training sample. The prediction abilities were relatively high, varying from 0.54 (yield) to 0.92 (grain length). We concluded that the current population of 1495 hybrids can be used to predict hybrids from seemingly unrelated parents. Eventually, we used this training population to predict all potential hybrids of cytoplasm male sterile lines from 3000 rice varieties from the 3K Rice Genome Project. Using a breeding index combining 10 traits, we identified the top and bottom 200 predicted hybrids. SNP genotypes of the training population and parameters estimated from this training population are available for general uses and further validation in genomic hybrid prediction of all potential hybrids generated from all varieties of rice.
Journal Article
A scalable estimate of the out-of-sample prediction error via approximate leave-one-out cross-validation
2020
The paper considers the problem of out-of-sample risk estimation under the high dimensional settings where standard techniques such as K-fold cross-validation suffer from large biases. Motivated by the low bias of the leave-one-out cross-validation method, we propose a computationally efficient closed form approximate leave-one-out formula ALO for a large class of regularized estimators. Given the regularized estimate, calculating ALO requires a minor computational overhead. With minor assumptions about the data-generating process, we obtain a finite sample upper bound for the difference between leave-one-out cross-validation and approximate leave-one-out cross-validation, |LO – ALO|. Our theoretical analysis illustrates that |LO – ALO| → 0 with overwhelming probability, when n, p → ∞, where the dimension p of the feature vectors may be comparable with or even greater than the number of observations, n. Despite the high dimensionality of the problem, our theoretical results do not require any sparsity assumption on the vector of regression coefficients. Our extensive numerical experiments show that |LO – ALO| decreases as n and p increase, revealing the excellent finite sample performance of approximate leave-one-out cross-validation. We further illustrate the usefulness of our proposed out-of-sample risk estimation method by an example of real recordings from spatially sensitive neurons (grid cells) in the medial entorhinal cortex of a rat.
Journal Article
Evidence for Missing Geomagnetic Reversals From Geomagnetic Reversal Frequency Model Using Adaptive Kernel Density Estimation
2026
The existence of missing geomagnetic reversals has been proposed, with potential for new magnetostratigraphic age controls. We estimate geomagnetic reversal frequency from 0 to 155 Ma using adaptive‐bandwidth kernel density estimation (AKDE) to evaluate data sparseness and to assess how reversal frequency changes when recently identified geomagnetic reversals are incorporated into the geomagnetic polarity time scale (GPTS) data set. AKDE is a two‐stage procedure that uses an initial density estimator based on an initial (pilot) bandwidth. We found that the pilot bandwidth determined using cross‐validation is stable with respect to data set length. The AKDE results obtained based on the cross‐validated pilot bandwidth reveal four troughs after the Cretaceous Normal Superchron, spaced 13.5–15.0 Myr apart and corresponding to relatively long chrons (>0.8 Myr). One trough near 32 Ma becomes less distinct after the four recently identified reversals are added to the data set. This sensitivity suggests that troughs in the frequency curve may indicate missing geomagnetic reversals.
Journal Article
Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models
2011
Recent work by Reiss and Ogden provides a theoretical basis for sometimes preferring restricted maximum likelihood (REML) to generalized cross-validation (GCV) for smoothing parameter selection in semiparametric regression. However, existing REML or marginal likelihood (ML) based methods for semiparametric generalized linear models (GLMs) use iterative REML or ML estimation of the smoothing parameters of working linear approximations to the GLM. Such indirect schemes need not converge and fail to do so in a non-negligible proportion of practical analyses. By contrast, very reliable prediction error criteria smoothing parameter selection methods are available, based on direct optimization of GCV, or related criteria, for the GLM itself. Since such methods directly optimize properly defined functions of the smoothing parameters, they have much more reliable convergence properties. The paper develops the first such method for REML or ML estimation of smoothing parameters. A Laplace approximation is used to obtain an approximate REML or ML for any GLM, which is suitable for efficient direct optimization. This REML or ML criterion requires that Newton-Raphson iteration, rather than Fisher scoring, be used for GLM fitting, and a computationally stable approach to this is proposed. The REML or ML criterion itself is optimized by a Newton method, with the derivatives required obtained by a mixture of implicit differentiation and direct methods. The method will cope with numerical rank deficiency in the fitted model and in fact provides a slight improvement in numerical robustness on the earlier method of Wood for prediction error criteria based smoothness selection. Simulation results suggest that the new REML and ML methods offer some improvement in mean-square error performance relative to GCV or Akaike's information criterion in most cases, without the small number of severe undersmoothing failures to which Akaike's information criterion and GCV are prone. This is achieved at the same computational cost as GCV or Akaike's information criterion. The new approach also eliminates the convergence failures of previous REML- or ML-based approaches for penalized GLMs and usually has lower computational cost than these alternatives. Example applications are presented in adaptive smoothing, scalar on function regression and generalized additive model selection.
Journal Article
Model selection and assessment for multi-species occupancy models
by
Broms, Kristin M.
,
Fitzpatrick, Ryan M.
,
Hooten, Mevin B.
in
Animals
,
Bayes Theorem
,
Bayesian analysis
2016
While multi-species occupancy models (MSOMs) are emerging as a popular method for analyzing biodiversity data, formal checking and validation approaches for this class of models have lagged behind. Concurrent with the rise in application of MSOMs among ecologists, a quiet regime shift is occurring in Bayesian statistics where predictive model comparison approaches are experiencing a resurgence. Unlike single-species occupancy models that use integrated likelihoods, MSOMs are usually couched in a Bayesian framework and contain multiple levels. Standard model checking and selection methods are often unreliable in this setting and there is only limited guidance in the ecological literature for this class of models. We examined several different contemporary Bayesian hierarchical approaches for checking and validating MSOMs and applied these methods to a freshwater aquatic study system in Colorado, USA, to better understand the diversity and distributions of plains fishes. Our findings indicated distinct differences among model selection approaches, with cross-validation techniques performing the best in terms of prediction.
Journal Article