Catalogue Search | MBRL

BAMLSS: Bayesian Additive Models for Location, Scale, and Shape (and Beyond)

by Zeileis, Achim , Umlauf, Nikolaus , Klein, Nadja in Bayesian Models , BUGS , Distributional regression

2018

Bayesian analysis provides a convenient setting for the estimation of complex generalized additive regression models (GAMs). Since computational power has tremendously increased in the past decade, it is now possible to tackle complicated inferential problems, for example, with Markov chain Monte Carlo simulation, on virtually any modern computer. This is one of the reasons why Bayesian methods have become increasingly popular, leading to a number of highly specialized and optimized estimation engines and with attention shifting from conditional mean models to probabilistic distributional models capturing location, scale, shape (and other aspects) of the response distribution. To embed many different approaches suggested in literature and software, a unified modeling architecture for distributional GAMs is established that exploits distributions, estimation techniques (posterior mode or posterior mean), and model terms (fixed, random, smooth, spatial,...). It is shown that within this framework implementing algorithms for complex regression problems, as well as the integration of already existing software, is relatively straightforward. The usefulness is emphasized with two complex and computationally demanding application case studies: a large daily precipitation climatology, as well as a Cox model for continuous time with space-time interactions. Supplementary material for this article is available online.

Journal Article

Share this book

Add to My Shelf

Smoothing Parameter and Model Selection for General Smooth Models

by Säfken, Benjamin , Wood, Simon N. , Pya, Natalya in Additive model , Additives , Distributional regression

2016

This article discusses a general framework for smoothing parameter estimation for models with regular likelihoods constructed in terms of unknown smooth functions of covariates. Gaussian random effects and parametric terms may also be present. By construction the method is numerically stable and convergent, and enables smoothing parameter uncertainty to be quantified. The latter enables us to fix a well known problem with AIC for such models, thereby improving the range of model selection tools available. The smooth functions are represented by reduced rank spline like smoothers, with associated quadratic penalties measuring function smoothness. Model estimation is by penalized likelihood maximization, where the smoothing parameters controlling the extent of penalization are estimated by Laplace approximate marginal likelihood. The methods cover, for example, generalized additive models for nonexponential family responses (e.g., beta, ordered categorical, scaled t distribution, negative binomial and Tweedie distributions), generalized additive models for location scale and shape (e.g., two stage zero inflation models, and Gaussian location-scale models), Cox proportional hazards models and multivariate additive models. The framework reduces the implementation of new model classes to the coding of some standard derivatives of the log-likelihood. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

A review of predictive uncertainty estimation with machine learning

by Tyralis, Hristos , Papacharalampous, Georgia in Algorithms , Analysis , Artificial Intelligence

2024

Predictions and forecasts of machine learning models should take the form of probability distributions, aiming to increase the quantity of information communicated to end users. Although applications of probabilistic prediction and forecasting with machine learning models in academia and industry are becoming more frequent, related concepts and methods have not been formalized and structured under a holistic view of the entire field. Here, we review the topic of predictive uncertainty estimation with machine learning algorithms, as well as the related metrics (consistent scoring functions and proper scoring rules) for assessing probabilistic predictions. The review covers a time period spanning from the introduction of early statistical (linear regression and time series models, based on Bayesian statistics or quantile regression) to recent machine learning algorithms (including generalized additive models for location, scale and shape, random forests, boosting and deep learning algorithms) that are more flexible by nature. The review of the progress in the field, expedites our understanding on how to develop new algorithms tailored to users’ needs, since the latest advancements are based on some fundamental concepts applied to more complex algorithms. We conclude by classifying the material and discussing challenges that are becoming a hot topic of research.

Journal Article

Share this book

Add to My Shelf

Comparison between Highly Complex Location Models and GAMLSS

by Vieira, Lucas A. , Ramires, Thiago G. , Nakamura, Luiz R. in Artificial intelligence , beyond mean regression , distributional regression

2021

This paper presents a discussion regarding regression models, especially those belonging to the location class. Our main motivation is that, with simple distributions having simple interpretations, in some cases, one gets better results than the ones obtained with overly complex distributions. For instance, with the reverse Gumbel (RG) distribution, it is possible to explain response variables by making use of the generalized additive models for location, scale, and shape (GAMLSS) framework, which allows the fitting of several parameters (characteristics) of the probabilistic distributions, like mean, mode, variance, and others. Three real data applications are used to compare several location models against the RG under the GAMLSS framework. The intention is to show that the use of a simple distribution (e.g., RG) based on a more sophisticated regression structure may be preferable than using a more complex location model.

Journal Article

Share this book

Add to My Shelf

Assessing the impact of variance heterogeneity and misspecification in mixed-effects location-scale models

by Jeanselme, Vincent , Barrett, Jessica , Palma, Marco in Blood pressure , Datasets , Distributional regression

2026

Purpose Linear Mixed Model (LMM) is a common statistical approach to model the relation between exposure and outcome while capturing individual variability through random effects. However, this model assumes the homogeneity of the error term’s variance. Breaking this assumption, known as homoscedasticity, can bias estimates and, consequently, may change a study’s conclusions. If this assumption is unmet, the mixed-effect location-scale model (MELSM) offers a solution to account for within-individual variability. Methods Our work explores how LMMs and MELSMs behave when the homoscedasticity assumption is not met. Further, we study how misspecification affects inference for MELSM. To this aim, we propose a simulation study with longitudinal data and evaluate the estimates’ bias and coverage. Results Our simulations show that neglecting heteroscedasticity in LMMs leads to loss of coverage for the estimated coefficients and biases the estimates of the standard deviations of the random effects. In MELSMs, scale misspecification does not bias the location model, but location misspecification alters the scale estimates. Conclusion Our simulation study illustrates the importance of modelling heteroscedasticity, with potential implications beyond mixed effect models, for generalised linear mixed models for non-normal outcomes and joint models with survival data.

Journal Article

Share this book

Add to My Shelf

Rainfall increases conformity and strength of species–area relationships

by Steibl, Sebastian , Valente, Luís , Russell, James C. in atoll , Atolls , Bayesian distributional regression

2025

The positive relationship between species richness and area is regarded as one of the few laws in ecology. Therefore, deviations from predictable species–area scaling, evident as high residual variance in species–area curves, are often interpreted as anomalous behaviour. Small‐island systems often do not conform to species–area relationships, yet the high stochasticity in their species–area curves is frequently treated as unexplainable noise or attributed to idiosyncratic extinction rates. Here, we introduce a statistical framework that incorporates the degree of stochasticity in species–area relationships as an explicit, interpretable model parameter. Using a global island plant dataset for atolls (378 islands across 19 atolls) – prototypical examples for small‐island dynamics – we show that the degree of residual variance in species–area curves can be captured, modelled, and linked to environmental conditions. Our heteroscedastic modelling approach demonstrates that apparent stochasticity in species–area relationships is not random but predictable through environmental drivers. Specifically, we found that increased rainfall reduces the residual variance around the species–area curve, indicating that resource availability is a critical factor enabling conformity to species–area scaling. Cyclone disturbance frequency did not drive stochasticity, challenging the prevailing view that disturbance regimes drive the stochasticity in species–area scaling on small islands. By treating residual variance as an explicit model parameter in species–area relationships rather than unexplainable noise, our approach provides new insights into the conditions enabling biological communities to conform to species–area scaling. Shifting the focus in species–area studies on the residual variance as an interpretable model parameter that captures the degree of conformity to species–area scaling offers novel perspectives into the environmental factors prerequisite for species–area scaling. This contributes to unifying the apparent anomalous, stochastic nature of small‐island systems with the general law of linear species–area scaling.

Journal Article

Share this book

Add to My Shelf

Mixture density networks for the indirect estimation of reference intervals

by Metzler, Markus , Hepp, Tobias , Seitz, Sarem in Age determination , Age groups , Algorithms

2022

Background Reference intervals represent the expected range of physiological test results in a healthy population and are essential to support medical decision making. Particularly in the context of pediatric reference intervals, where recruitment regulations make prospective studies challenging to conduct, indirect estimation strategies are becoming increasingly important. Established indirect methods enable robust identification of the distribution of “healthy” samples from laboratory databases, which include unlabeled pathologic cases, but are currently severely limited when adjusting for essential patient characteristics such as age. Here, we propose the use of mixture density networks (MDN) to overcome this problem and model all parameters of the mixture distribution in a single step. Results Estimated reference intervals from varying settings with simulated data demonstrate the ability to accurately estimate latent distributions from unlabeled data using different implementations of MDNs. Comparing the performance with alternative estimation approaches further highlights the importance of modeling the mixture component weights as a function of the input in order to avoid biased estimates for all other parameters and the resulting reference intervals. We also provide a strategy to generate partially customized starting weights to improve proper identification of the latent components. Finally, the application on real-world hemoglobin samples provides results in line with current gold standard approaches, but also suggests further investigations with respect to adequate regularization strategies in order to prevent overfitting the data. Conclusions Mixture density networks provide a promising approach capable of extracting the distribution of healthy samples from unlabeled laboratory databases while simultaneously and explicitly estimating all parameters and component weights as non-linear functions of the covariate(s), thereby allowing the estimation of age-dependent reference intervals in a single step. Further studies on model regularization and asymmetric component distributions are warranted to consolidate our findings and expand the scope of applications.

Journal Article

Share this book

Add to My Shelf

Location–scale models in ecology and evolution: Heteroscedasticity in continuous, count and proportion data

by Lagisz, Malgorzata , Gazzea, Elena , Lenz, Anna in Bayesian statistics , Biological activity , Dietary supplements

2026

Biological data often violate the assumption of constant variance, yet such heteroscedasticity can reflect meaningful biological processes such as plasticity, canalization or stress responses. Despite this, most models treat variance as statistical noise. Here, we reintroduce location–scale regression as a general framework that jointly models the mean (location) and variance (scale) components of a response. We describe three hierarchical extensions: (1) fixed‐effects, (2) mixed‐effects and (3) double‐hierarchical models, which allow researchers to formally test variance structures alongside mean effects, enhancing biological interpretation. This framework is highly flexible and can extend beyond Gaussian assumptions to accommodate real‐world data. The framework accommodates over‐dispersed, under‐dispersed and zero‐inflated count data through the use of negative binomial and Conway–Maxwell–Poisson distributions, and bounded proportion data through beta‐binomial and beta regressions. Submodels can also be incorporated to account for structural zeros and ones when boundary outcomes are common. These extensions allow researchers to capture ecological processes such as presence–absence, success rates and bounded response rates. Using worked examples from published evolutionary and behavioural ecological studies, we illustrate how location–scale models can uncover biologically meaningful variance patterns that are overlooked in models focused solely on means. For instance, we show how food supplementation, hatching order and predation risk influence not only average trait values but also their variability. Each example corresponds to one of the model types and is implemented using widely used R packages such as glmmTMB and brms. All examples are accompanied by a freely accessible, step‐by‐step online tutorial, thereby lowering technical barriers and fostering broader adoption of location–scale modelling in ecological and evolutionary research. Finally, we propose a practical workflow for model selection and diagnostics and highlight recent extensions of the framework. These include multi‐response models, meta‐analytic models, phylogenetic comparative models and models including shape parameters such as skewness. Treating variance as a biologically informative response opens new avenues for us to explore the evolutionary, ecological and environmental processes that shape biological systems across diverse contexts.

Journal Article

Share this book

Add to My Shelf

Longitudinal Bayesian Zero-Inflated Beta Regression for Citrus Canker Resistance in Orange Rootstocks

by do Nascimento, Diego Carvalho , Gonzatto Junior, Oilson Alberto , Janeiro, Vanderly

2025

Journal Article

Share this book

Add to My Shelf

Distributional regression in clinical trials: treatment effects on parameters other than the mean

by Marschner, Ian C. , Heller, Gillian Z. , Robledo, Kristy P. in Biomarkers , Cardiac patients , Care and treatment

2022

Background The classical linear model is widely used in the analysis of clinical trials with continuous outcomes. However, required model assumptions are frequently not met, resulting in estimates of treatment effect that can be inefficient and biased. In addition, traditional models assess treatment effect only on the mean response, and not on other aspects of the response, such as the variance. Distributional regression modelling overcomes these limitations. The purpose of this paper is to demonstrate its usefulness for the analysis of clinical trials, and superior performance to that of traditional models. Methods Distributional regression models are demonstrated, and contrasted with normal linear models, on data from the LIPID randomized controlled trial, which compared the effects of pravastatin with placebo in patients with coronary heart disease. Systolic blood pressure (SBP) and the biomarker midregional pro-adrenomedullin (MR-proADM) were analysed. Treatment effect was estimated in models that used response distributions more appropriate than the normal (Box-Cox-t and Johnson’s S u for MR-proADM and SBP, respectively), applied censoring below the detection limit of MR-proADM, estimated treatment effect on distributional parameters other than the mean, and included random effects for longitudinal observations. A simulation study was conducted to compare the performance of distributional regression models with normal linear regression, under conditions mimicking the LIPID study. The R package gamlss (Generalized Additive Models for Location, Scale and Shape), which implements maximum likelihood estimation for distributional regression modelling, was used throughout. Results In all cases the distributional regression models fit the data well, in contrast to poor fits obtained for traditional models; for MR-proADM a small but significant treatment effect on the mean was detected by the distributional regression model and not the normal model; and for SBP a beneficial treatment effect on the variance was demonstrated. In the simulation study distributional models strongly outperformed normal models when the response variable was non-normal and heterogeneous; and there was no disadvantage introduced by the use of distributional regression modelling when the response satisfied the normal linear model assumptions. Conclusions Distributional regression models are a rich framework, largely untapped in the clinical trials world. We have demonstrated a sample of the capabilities of these models for the analysis of trials. If interest lies in accurate estimation of treatment effect on the mean, or other distributional features such as variance, the use of distributional regression modelling will yield superior estimates to traditional normal models, and is strongly recommended. Trial registration The LIPID trial was retrospectively registered on ANZCTR on 27/04/2016, registration number ACTRN12616000535471 .

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter