Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
762
result(s) for
"Cross-validation (statistics)"
Sort by:
Complex Population Dynamics
Why do organisms become extremely abundant one year and then seem to disappear a few years later? Why do population outbreaks in particular species happen more or less regularly in certain locations, but only irregularly (or never at all) in other locations? Complex population dynamics have fascinated biologists for decades. By bringing together mathematical models, statistical analyses, and field experiments, this book offers a comprehensive new synthesis of the theory of population oscillations. Peter Turchin first reviews the conceptual tools that ecologists use to investigate population oscillations, introducing population modeling and the statistical analysis of time series data. He then provides an in-depth discussion of several case studies--including the larch budmoth, southern pine beetle, red grouse, voles and lemmings, snowshoe hare, and ungulates--to develop a new analysis of the mechanisms that drive population oscillations in nature. Through such work, the author argues, ecologists can develop general laws of population dynamics that will help turn ecology into a truly quantitative and predictive science. Complex Population Dynamics integrates theoretical and empirical studies into a major new synthesis of current knowledge about population dynamics. It is also a pioneering work that sets the course for ecology's future as a predictive science.
Matrices, Moments and Quadrature with Applications
2009,2010
This computationally oriented book describes and explains the mathematical relationships among matrices, moments, orthogonal polynomials, quadrature rules, and the Lanczos and conjugate gradient algorithms. The book bridges different mathematical areas to obtain algorithms to estimate bilinear forms involving two vectors and a function of the matrix. The first part of the book provides the necessary mathematical background and explains the theory. The second part describes the applications and gives numerical examples of the algorithms and techniques developed in the first part. Applications addressed in the book include computing elements of functions of matrices; obtaining estimates of the error norm in iterative methods for solving linear systems and computing parameters in least squares and total least squares; and solving ill-posed problems using Tikhonov regularization. This book will interest researchers in numerical linear algebra and matrix computations, as well as scientists and engineers working on problems involving computation of bilinear forms.
Ecological Niches and Geographic Distributions (MPB-49)
by
Enrique Martínez-Meyer
,
Richard G. Pearson
,
Miguel Nakamura
in
Algorithm
,
American Museum of Natural History
,
Bastian
2011,2012
This book provides a first synthetic view of an emerging area of ecology and biogeography, linking individual- and population-level processes to geographic distributions and biodiversity patterns. Problems in evolutionary ecology, macroecology, and biogeography are illuminated by this integrative view. The book focuses on correlative approaches known as ecological niche modeling, species distribution modeling, or habitat suitability modeling, which use associations between known occurrences of species and environmental variables to identify environmental conditions under which populations can be maintained. The spatial distribution of environments suitable for the species can then be estimated: a potential distribution for the species. This approach has broad applicability to ecology, evolution, biogeography, and conservation biology, as well as to understanding the geographic potential of invasive species and infectious diseases, and the biological implications of climate change. The authors lay out conceptual foundations and general principles for understanding and interpreting species distributions with respect to geography and environment. Focus is on development of niche models. While serving as a guide for students and researchers, the book also provides a theoretical framework to support future progress in the field.
Neural Networks vs. Regression: A Comparative Analysis in Medical Data Processing
by
Andor, Minodora
,
Mihalas, Gheorghe Ioan
in
Accuracy
,
AI in Healthcare, Medical AI, Healthcare Statistics, Predictive Modeling, Medical Data Analysis, Sensitivity Analysis, AUC-ROC, Cross-validation, Model Validation, Federated Learning
,
Artificial intelligence
2025
Background and Aim: The increasing adoption of artificial intelligence (AD in medical research offered alternative methods for medical data processing. This study evaluated comparatively the predictive performance of feedforward neural networks (FFNN) regression versus classical statistical regression analysis in estimating the risk of post-COVID-19 type 2 diabetes based on metabolic factors. The primary objective was to assess the applicability, advantages, and limitations of these approaches when applied to relatively small medical datasets. Materials and Methods: We started with the analysis of a small dataset - 130 patient records with metabolic parameters [1]. The risk of post-COVID-19 type 2 diabetes (glycaemia at 4 and at 12 months post-COVID as function of metabolic parameters) was predicted using both linear regression and FFNN. The regression model followed standard statistical guidelines, while the FFNN required optimization of hyperparameters, including the number of layers, activation functions, learning rate, and optimization algorithms. We extended the study using simulated data to further compare logistic regression (a data set of 300 patients) with neural networks. Results: The classical regression models demonstrated stable performance with clear interpretability, offering well-defined coefficients and statistical significance measures. However, FFNN did not yield superior predictive accuracy, and its performance varied significantly depending on the choice of hyperparameters. The optimization process for NN required extensive trial and error, as no universal guidelines for parameter selection were applicable in this context. Discussion. Our findings highlight a real challenge in medical AI applications for data processing: when dealing with small datasets, neural networks do not necessarily outperform classical methods. Regression provided robust results with minimal computational effort, while FFNN required complex tuning without a clear performance advantage. The use of simulated data revealed that NN might be more effective in larger datasets with potential non-linear patterns, but limited interpretability. Conclusion: Al-based models are, indeed, recommended for data processing of large and/or unstructured complex medical data sets. However, as a conclusion of this study, regression models proved to be a more practical and reliable choice for small-scale medical predictions. Future work should explore hybrid models that combine interpretability with non-linear modeling capacity to optimize predictive accuracy in clinical settings.
Journal Article
Cross-Validation Visualized: A Narrative Guide to Advanced Methods
2024
This study delves into the multifaceted nature of cross-validation (CV) techniques in machine learning model evaluation and selection, underscoring the challenge of choosing the most appropriate method due to the plethora of available variants. It aims to clarify and standardize terminology such as sets, groups, folds, and samples pivotal in the CV domain, and introduces an exhaustive compilation of advanced CV methods like leave-one-out, leave-p-out, Monte Carlo, grouped, stratified, and time-split CV within a hold-out CV framework. Through graphical representations, the paper enhances the comprehension of these methodologies, facilitating more informed decision making for practitioners. It further explores the synergy between different CV strategies and advocates for a unified approach to reporting model performance by consolidating essential metrics. The paper culminates in a comprehensive overview of the CV techniques discussed, illustrated with practical examples, offering valuable insights for both novice and experienced researchers in the field.
Journal Article
Fast stable direct fitting and smoothness selection for generalized additive models
2008
Existing computationally efficient methods for penalized likelihood generalized additive model fitting employ iterative smoothness selection on working linear models (or working mixed models). Such schemes fail to converge for a non-negligible proportion of models, with failure being particularly frequent in the presence of concurvity. If smoothness selection is performed by optimizing 'whole model' criteria these problems disappear, but until now attempts to do this have employed finite-difference-based optimization schemes which are computationally inefficient and can suffer from false convergence. The paper develops the first computationally efficient method for direct generalized additive model smoothness selection. It is highly stable, but by careful structuring achieves a computational efficiency that leads, in simulations, to lower mean computation times than the schemes that are based on working model smoothness selection. The method also offers a reliable way of fitting generalized additive mixed models.
Journal Article
A scalable estimate of the out-of-sample prediction error via approximate leave-one-out cross-validation
2020
The paper considers the problem of out-of-sample risk estimation under the high dimensional settings where standard techniques such as K-fold cross-validation suffer from large biases. Motivated by the low bias of the leave-one-out cross-validation method, we propose a computationally efficient closed form approximate leave-one-out formula ALO for a large class of regularized estimators. Given the regularized estimate, calculating ALO requires a minor computational overhead. With minor assumptions about the data-generating process, we obtain a finite sample upper bound for the difference between leave-one-out cross-validation and approximate leave-one-out cross-validation, |LO – ALO|. Our theoretical analysis illustrates that |LO – ALO| → 0 with overwhelming probability, when n, p → ∞, where the dimension p of the feature vectors may be comparable with or even greater than the number of observations, n. Despite the high dimensionality of the problem, our theoretical results do not require any sparsity assumption on the vector of regression coefficients. Our extensive numerical experiments show that |LO – ALO| decreases as n and p increase, revealing the excellent finite sample performance of approximate leave-one-out cross-validation. We further illustrate the usefulness of our proposed out-of-sample risk estimation method by an example of real recordings from spatially sensitive neurons (grid cells) in the medial entorhinal cortex of a rat.
Journal Article
PREDICTIVE INFERENCE WITH THE JACKKNIFE
by
Ramdas, Aaditya
,
Candès, Emmanuel J.
,
Barber, Rina Foygel
in
Algorithms
,
Confidence intervals
,
Data points
2021
This paper introduces the jackknife+, which is a novel method for constructing predictive confidence intervals. Whereas the jackknife outputs an interval centered at the predicted response of a test point, with the width of the interval determined by the quantiles of leave-one-out residuals, the jackknife+ also uses the leave-one-out predictions at the test point to account for the variability in the fitted regression function. Assuming exchangeable training samples, we prove that this crucial modification permits rigorous coverage guarantees regardless of the distribution of the data points, for any algorithm that treats the training points symmetrically. Such guarantees are not possible for the original jackknife and we demonstrate examples where the coverage rate may actually vanish. Our theoretical and empirical analysis reveals that the jackknife and the jackknife+ intervals achieve nearly exact coverage and have similar lengths whenever the fitting algorithm obeys some form of stability. Further, we extend the jackknife+ to K-fold cross validation and similarly establish rigorous coverage properties. Our methods are related to cross-conformal prediction proposed by Vovk (Ann. Math. Artif. Intell. 74 (2015) 9–28) and we discuss connections.
Journal Article
Recursive partitioning for heterogeneous causal effects
2016
In this paper we propose methods for estimating heterogeneity in causal effects in experimental and observational studies and for conducting hypothesis tests about the magnitude of differences in treatment effects across subsets of the population. We provide a data-driven approach to partition the data into subpopulations that differ in the magnitude of their treatment effects. The approach enables the construction of valid confidence intervals for treatment effects, even with many covariates relative to the sample size, and without “sparsity” assumptions.We propose an “honest” approach to estimation, whereby one sample is used to construct the partition and another to estimate treatment effects for each subpopulation. Our approach builds on regression tree methods, modified to optimize for goodness of fit in treatment effects and to account for honest estimation. Our model selection criterion anticipates that bias will be eliminated by honest estimation and also accounts for the effect of making additional splits on the variance of treatment effect estimates within each subpopulation. We address the challenge that the “ground truth” for a causal effect is not observed for any individual unit, so that standard approaches to cross-validation must be modified. Through a simulation study, we show that for our preferred method honest estimation results in nominal coverage for 90% confidence intervals, whereas coverage ranges between 74% and 84% for nonhonest approaches. Honest estimation requires estimating the model with a smaller sample size; the cost in terms of mean squared error of treatment effects for our preferred method ranges between 7–22%.
Journal Article
Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models
2011
Recent work by Reiss and Ogden provides a theoretical basis for sometimes preferring restricted maximum likelihood (REML) to generalized cross-validation (GCV) for smoothing parameter selection in semiparametric regression. However, existing REML or marginal likelihood (ML) based methods for semiparametric generalized linear models (GLMs) use iterative REML or ML estimation of the smoothing parameters of working linear approximations to the GLM. Such indirect schemes need not converge and fail to do so in a non-negligible proportion of practical analyses. By contrast, very reliable prediction error criteria smoothing parameter selection methods are available, based on direct optimization of GCV, or related criteria, for the GLM itself. Since such methods directly optimize properly defined functions of the smoothing parameters, they have much more reliable convergence properties. The paper develops the first such method for REML or ML estimation of smoothing parameters. A Laplace approximation is used to obtain an approximate REML or ML for any GLM, which is suitable for efficient direct optimization. This REML or ML criterion requires that Newton-Raphson iteration, rather than Fisher scoring, be used for GLM fitting, and a computationally stable approach to this is proposed. The REML or ML criterion itself is optimized by a Newton method, with the derivatives required obtained by a mixture of implicit differentiation and direct methods. The method will cope with numerical rank deficiency in the fitted model and in fact provides a slight improvement in numerical robustness on the earlier method of Wood for prediction error criteria based smoothness selection. Simulation results suggest that the new REML and ML methods offer some improvement in mean-square error performance relative to GCV or Akaike's information criterion in most cases, without the small number of severe undersmoothing failures to which Akaike's information criterion and GCV are prone. This is achieved at the same computational cost as GCV or Akaike's information criterion. The new approach also eliminates the convergence failures of previous REML- or ML-based approaches for penalized GLMs and usually has lower computational cost than these alternatives. Example applications are presented in adaptive smoothing, scalar on function regression and generalized additive model selection.
Journal Article