Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
106,737
result(s) for
"Model performance"
Sort by:
Performance evaluation of global hydrological models in six large Pan-Arctic watersheds
by
Satoh Yusuke
,
Müller, Schmied Hannes
,
Krysanova Valentina
in
Algorithms
,
Climate and vegetation
,
Climate change
2020
Global Water Models (GWMs), which include Global Hydrological, Land Surface, and Dynamic Global Vegetation Models, present valuable tools for quantifying climate change impacts on hydrological processes in the data scarce high latitudes. Here we performed a systematic model performance evaluation in six major Pan-Arctic watersheds for different hydrological indicators (monthly and seasonal discharge, extremes, trends (or lack of), and snow water equivalent (SWE)) via a novel Aggregated Performance Index (API) that is based on commonly used statistical evaluation metrics. The machine learning Boruta feature selection algorithm was used to evaluate the explanatory power of the API attributes. Our results show that the majority of the nine GWMs included in the study exhibit considerable difficulties in realistically representing Pan-Arctic hydrological processes. Average APIdischarge (monthly and seasonal discharge) over nine GWMs is > 50% only in the Kolyma basin (55%), as low as 30% in the Yukon basin and averaged over all watersheds APIdischarge is 43%. WATERGAP2 and MATSIRO present the highest (APIdischarge > 55%) while ORCHIDEE and JULES-W1 the lowest (APIdischarge ≤ 25%) performing GWMs over all watersheds. For the high and low flows, average APIextreme is 35% and 26%, respectively, and over six GWMs APISWE is 57%. The Boruta algorithm suggests that using different observation-based climate data sets does not influence the total score of the APIs in all watersheds. Ultimately, only satisfactory to good performing GWMs that effectively represent cold-region hydrological processes (including snow-related processes, permafrost) should be included in multi-model climate change impact assessments in Pan-Arctic watersheds.
Journal Article
Why we need lower-performance climate models
2024
All models are wrong, but models are not all equally wrong. Indeed, they can be wrong to different degrees and in entirely different ways. Here, we show that GCMs which are lower-performance (for particular tasks and applications) play a crucial role in climate science research. That is, lower-performance models help scientists gain knowledge they would otherwise lack, a point that is often underappreciated and has been under-theorized. More specifically, in the climate science literature, we see that lower-performance models help constrain the estimates of climate variables, lower-performance models provide data to test model weighting schemes, and lower-performance models serve as evidence to help resolve model-data discrepancies. This implies that (i) lower-performance models ought not be eliminated from analysis too hastily and (ii) the value of multi-model ensembles goes beyond exploring structural uncertainty and includes the counterintuitive generation of new knowledge via, in part, lower-performance models. As a result of (ii), model intercomparison efforts require reappraisal, particularly when deciding how to allocate modeling resources.
Journal Article
A Probability-Based Models Ranking Approach: An Alternative Method of Machine-Learning Model Performance Assessment
2022
Performance measures are crucial in selecting the best machine learning model for a given problem. Estimating classical model performance measures by subsampling methods like bagging or cross-validation has several weaknesses. The most important ones are the inability to test the significance of the difference, and the lack of interpretability. Recently proposed Elo-based Predictive Power (EPP)—a meta-measure of machine learning model performance, is an attempt to address these weaknesses. However, the EPP is based on wrong assumptions, so its estimates may not be correct. This paper introduces the Probability-based Ranking Model Approach (PMRA), which is a modified EPP approach with a correction that makes its estimates more reliable. PMRA is based on the calculation of the probability that one model achieves a better result than another one, using the Mixed Effects Logistic Regression model. The empirical analysis was carried out on a real mortgage credits dataset. The analysis included a comparison of how the PMRA and state-of-the-art k-fold cross-validation ranked the 49 machine learning models, an example application of a novel method in hyperparameters tuning problem, and a comparison of PMRA and EPP indications. PMRA gives the opportunity to compare a newly developed algorithm to state-of-the-art algorithms based on statistical criteria. It is the solution to select the best hyperparameters configuration and to formulate criteria for the continuation of the hyperparameters space search.
Journal Article
Scientific and Human Errors in a Snow Model Intercomparison
2021
Twenty-seven models participated in the Earth System Model–Snow Model Intercomparison Project (ESM-SnowMIP), the most data-rich MIP dedicated to snow modeling. Our findings do not support the hypothesis advanced by previous snow MIPs: evaluating models against more variables and providing evaluation datasets extended temporally and spatially does not facilitate identification of key new processes requiring improvement to model snow mass and energy budgets, even at point scales. In fact, the same modeling issues identified by previous snow MIPs arose: albedo is a major source of uncertainty, surface exchange parameterizations are problematic, and individual model performance is inconsistent. This lack of progress is attributed partly to the large number of human errors that led to anomalous model behavior and to numerous resubmissions. It is unclear how widespread such errors are in our field and others; dedicated time and resources will be needed to tackle this issue to prevent highly sophisticated models and their research outputs from being vulnerable because of avoidable human mistakes. The design of and the data available to successive snow MIPs were also questioned. Evaluation of models against bulk snow properties was found to be sufficient for some but inappropriate for more complex snow models whose skills at simulating internal snow properties remained untested. Discussions between the authors of this paper on the purpose of MIPs revealed varied, and sometimes contradictory, motivations behind their participation. These findings started a collaborative effort to adapt future snow MIPs to respond to the diverse needs of the community.
Journal Article
Linking Large-Scale Double-ITCZ Bias to Local-Scale Drizzling Bias in Climate Models
2022
Tropical precipitation in climate models presents significant biases in both the large-scale pattern (i.e., double intertropical convergence zone bias) and local-scale characteristics (i.e., drizzling bias with too frequent drizzle/convection and reduced occurrences of no and heavy precipitation). By untangling the coupled system and analyzing the biases in precipitation, cloud, and radiation, this study shows that local-scale drizzling bias in atmospheric models can lead to large-scale double-ITCZ bias in coupled models by inducing convective-regime-dependent biases in precipitation and cloud radiative effects (CRE). The double-ITCZ bias consists of a hemispherically asymmetric component that arises from the asymmetric SST bias and a nearly symmetric component that exists in atmospheric models without the SST bias. By increasing light rain but reducing heavy rain, local-scale drizzling bias induces positive (negative) precipitation bias in the moderate (strong) convective regime, leading to the nearly symmetric wet bias in atmospheric models. By affecting the cloud profile, local-scale drizzling bias induces positive (negative) CRE bias in the stratocumulus (convective) regime in atmospheric models. Because the stratocumulus (convective) region is climatologically more pronounced in the southern (northern) tropics, the CRE bias is deemed to be hemispherically asymmetric and drives warm and wet (cold and dry) biases in the southern (northern) tropics when coupled to ocean. Our results suggest that correcting local-scale drizzling bias is critical for fixing large-scale double-ITCZ bias. The drizzling and double-ITCZ biases are not alleviated in models with mesoscale (0.25°–0.5°) or even storm-resolving (∼3 km) resolution, implying that either large-eddy simulation or fundamental improvement in small-scale subgrid parameterizations is needed.
Journal Article
A comprehensive evaluation of predictive performance of 33 species distribution models at species and community levels
by
Soininen, Janne
,
Hui, Francis K.C
,
Vanhatalo, Jarno
in
Applications
,
Biodiversity and Ecology
,
Calibration
2019
A large array of species distribution model (SDM) approaches has been developed for explaining and predicting the occurrences of individual species or species assemblages. Given the wealth of existing models, it is unclear which models perform best for interpolation or extrapolation of existing data sets, particularly when one is concerned with species assemblages. We compared the predictive performance of 33 variants of 15 widely applied and recently emerged SDMs in the context of multispecies data, including both joint SDMs that model multiple species together, and stacked SDMs that model each species individually combining the predictions afterward. We offer a comprehensive evaluation of these SDM approaches by examining their performance in predicting withheld empirical validation data of different sizes representing five different taxonomic groups, and for prediction tasks related to both interpolation and extrapolation. We measure predictive performance by 12 measures of accuracy, discrimination power, calibration, and precision of predictions, for the biological levels of species occurrence, species richness, and community composition. Our results show large variation among the models in their predictive performance, especially for communities comprising many species that are rare. The results do not reveal any major trade‐offs among measures of model performance; the same models performed generally well in terms of accuracy, discrimination, and calibration, and for the biological levels of individual species, species richness, and community composition. In contrast, the models that gave the most precise predictions were not well calibrated, suggesting that poorly performing models can make overconfident predictions. However, none of the models performed well for all prediction tasks. As a general strategy, we therefore propose that researchers fit a small set of models showing complementary performance, and then apply a cross‐validation procedure involving separate data to establish which of these models performs best for the goal of the study.
Journal Article
Process-Oriented Diagnostics
by
Gettelman, Andrew
,
Ullrich, Paul
,
Ming, Yi
in
Atmosphere-ocean interaction
,
Best practice
,
Best practices
2023
Process-oriented diagnostics (PODs) aim to provide feedback for model developers through model analysis based on physical hypotheses. However, the step from a diagnostic based on relationships among variables, even when hypothesis driven, to specific guidance for revising model formulation or parameterizations can be substantial. The POD may provide more information than a purely performance-based metric, but a gap between POD principles and providing actionable information for specific model revisions can remain. Furthermore, in coordinating diagnostics development, there is a trade-off between freedom for the developer, aiming to capture innovation, and near-term utility to the modeling center. Best practices that allow for the former, while conforming to specifications that aid the latter, are important for community diagnostics development that leads to tangible model improvements. Promising directions to close the gap between principles and practice include the interaction of PODs with perturbed physics experiments and with more quantitative process models as well as the inclusion of personnel from modeling centers in diagnostics development groups for immediate feedback during climate model revisions. Examples are provided, along with best-practice recommendations, based on practical experience from the NOAA Model Diagnostics Task Force (MDTF). Common standards for metrics and diagnostics that have arisen from a collaboration between the MDTF and the Department of Energy’s Coordinated Model Evaluation Capability are advocated as a means of uniting community diagnostics efforts.
Journal Article
Clinical prediction models: diagnosis versus prognosis
by
van Smeden, Maarten
,
Reitsma, Johannes B
,
Collins, Gary S
in
Calibration
,
Clinical decision making
,
Decision making
2021
Clinical prediction models play an increasingly important role in contemporary clinical care, by informing healthcare professionals, patients and their relatives about outcome risks, with the aim to facilitate (shared) medical decision making and improve health outcomes. Diagnostic prediction models aim to calculate an individual's risk that a disease is already present, whilst prognostic prediction models aim to calculate the risk of particular heath states occurring in the future. This article serves as a primer for diagnostic and prognostic clinical prediction models, by discussing the basic terminology, some of the inherent challenges, and the need for validation of predictive performance and the evaluation of impact of these models in clinical care.
Journal Article
Climate Impacts of Convective Cloud Microphysics in NCAR CAM5
by
Shan, Yunpeng
,
Lin, Lin
,
Fu, Qiang
in
Aerosols
,
Atmospheric models
,
Atmospheric precipitations
2023
We improved the treatments of convective cloud microphysics in the NCAR Community Atmosphere Model version 5.3 (CAM5.3) by 1) implementing new terminal velocity parameterizations for convective ice and snow particles, 2) adding graupel microphysics, 3) considering convective snow detrainment, and 4) enhancing rain initiation and generation rate in warm clouds. We evaluated the impacts of improved microphysics on simulated global climate, focusing on simulated cloud radiative forcing, graupel microphysics, convective cloud ice amount, and tropical precipitation. Compared to CAM5.3 with the default convective microphysics, the too-strong cloud shortwave radiative forcing due primarily to excessive convective cloud liquid is largely alleviated over the tropics and midlatitudes after rain initiation and generation rate is enhanced, in better agreement with the CERES-EBAF estimates. Geographic distributions of graupel occurrence are reasonably simulated over continents; whereas the graupel occurrence remains highly uncertain over the oceanic storm-track regions. When evaluated against the CloudSat–CALIPSO estimates, the overestimation of convective ice mass is alleviated with the improved convective ice microphysics, among which adding graupel microphysics and the accompanying increase in hydrometeor fall speed play the most important role. The probability distribution function (PDF) of rainfall intensity is sensitive to warm rain processes in convective clouds, and enhancement in warm rain production shifts the PDF toward heavier precipitation, which agrees better with the TRMM observations. Common biases of overestimating the light rain frequency and underestimating the heavy rain frequency in GCMs are mitigated.
Journal Article
Evaluation of CMIP6 models toward dynamical downscaling over 14 CORDEX domains
2024
Both reliability and independence of global climate model (GCM) simulation are essential for model selection to generate a reasonable uncertainty range of dynamical downscaling simulations. In this study, we evaluate the performance and interdependency of 37 GCMs from the Coupled Model Intercomparison Project Phase 6 (CMIP6) in terms of seven key large-scale driving fields over 14 CORDEX domains. A multivariable integrated evaluation method is used to evaluate and rank the models’ ability to simulate multiple variables in terms of their climatological mean and interannual variability. The results suggest that the model performance varies considerably with seasons, domains, and variables evaluated, and no model outperforms in all aspects. However, the multi-model ensemble mean performs much better than almost all models. Among 37 CMIP6 models, the MPI-ESM1-2-HR and FIO-ESM-2-0 rank top two due to their overall good performance across all domains. To measure the model interdependency in terms of multiple fields, we define the similarity of multivariate error fields between pairwise models. Our results indicate that the dependence exists between most of the CMIP6 models, and the models sharing the same idea or/and concept generally show less independence. Furthermore, we hierarchically cluster the top 15 models with good performance based on the similarity of multivariate error fields to identify relatively independent models. Our evaluation can provide useful guidance on the selection of CMIP6 models based on their performance and relative independence, which helps to generate a more reliable ensemble of dynamical downscaling simulations with reasonable inter-model spread.
Journal Article