Catalogue Search | MBRL

Bayesian regression tree ensembles that adapt to smoothness and sparsity

by Yang, Yun , Linero, Antonio R. in Additives , algorithms , Bayesian additive regression trees

2018

Ensembles of decision trees are a useful tool for obtaining flexible estimates of regression functions. Examples of these methods include gradient-boosted decision trees, random forests and Bayesian classification and regression trees. Two potential shortcomings of tree ensembles are their lack of smoothness and their vulnerability to the curse of dimensionality. We show that these issues can be overcome by instead considering sparsity inducing soft decision trees in which the decisions are treated as probabilistic. We implement this in the context of the Bayesian additive regression trees framework and illustrate its promising performance through testing on benchmark data sets. We provide strong theoretical support for our methodology by showing that the posterior distribution concentrates at the minimax rate (up to a logarithmic factor) for sparse functions and functions with additive structures in the high dimensional regime where the dimensionality of the covariate space is allowed to grow nearly exponentially in the sample size. Our method also adapts to the unknown smoothness and sparsity levels, and can be implemented by making minimal modifications to existing Bayesian additive regression tree algorithms.

Journal Article

Share this book

Add to My Shelf

Bayesian Regression Trees for High-Dimensional Prediction and Variable Selection

by Linero, Antonio R. in Asymptotic properties , Bayesian additive regression trees , Bayesian analysis

2018

Decision tree ensembles are an extremely popular tool for obtaining high-quality predictions in nonparametric regression problems. Unmodified, however, many commonly used decision tree ensemble methods do not adapt to sparsity in the regime in which the number of predictors is larger than the number of observations. A recent stream of research concerns the construction of decision tree ensembles that are motivated by a generative probabilistic model, the most influential method being the Bayesian additive regression trees (BART) framework. In this article, we take a Bayesian point of view on this problem and show how to construct priors on decision tree ensembles that are capable of adapting to sparsity in the predictors by placing a sparsity-inducing Dirichlet hyperprior on the splitting proportions of the regression tree prior. We characterize the asymptotic distribution of the number of predictors included in the model and show how this prior can be easily incorporated into existing Markov chain Monte Carlo schemes. We demonstrate that our approach yields useful posterior inclusion probabilities for each predictor and illustrate the usefulness of our approach relative to other decision tree ensemble approaches on both simulated and real datasets. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Semiparametric mixed-scale models using shared Bayesian forests

by Sinha, Debajyoti , Linero, Antonio R. , Lipsitz, Stuart R. in Bayesian additive regression trees , Bayesian analysis , Bayesian theory

2020

This paper demonstrates the advantages of sharing information about unknown features of covariates across multiple model components in various nonparametric regression problems including multivariate, heteroscedastic, and semicontinuous responses. In this paper, we present a methodology which allows for information to be shared nonparametrically across various model components using Bayesian sum-of-tree models. Our simulation results demonstrate that sharing of information across related model components is often very beneficial, particularly in sparse high-dimensional problems in which variable selection must be conducted. We illustrate our methodology by analyzing medical expenditure data from the Medical Expenditure Panel Survey (MEPS). To facilitate the Bayesian nonparametric regression analysis, we develop two novel models for analyzing the MEPS data using Bayesian additive regression trees—a heteroskedastic log-normal hurdle model with a “shrinktoward-homoskedasticity” prior and a gamma hurdle model.

Journal Article

Share this book

Add to My Shelf

Combining BART and Principal Stratification to estimate the effect of intermediate variables on primary outcomes with application to estimating the effect of family planning on employment in Nigeria and Senegal

by Alkema, Leontine , Godoy Garraza, Lucas , Speizer, Ilene in Bayesian Additive Regression Trees , Bayesian Bootstrap , Generalizability

2026

There is interest in learning about the causal effects of modern contraceptive use on empowerment outcomes. Data on this question often come from family planning (FP) programs that increase access to FP and facilitate contraceptive use among some women, rather than directly assigning use. Women whose contraceptive behavior changes because of these programs (“compliers”) may differ from target populations in ways that alter the consequences of contraceptive use for empowerment outcomes.We propose a two-step approach. First, we use principal stratification and Bayesian Additive Regression Trees (BART) to estimate the effect of modern contraceptive use among compliers in the study population, treating the FP program as an instrument rather than as the treatment of interest. Second, we generalize these complier-specific effects to a broader population by averaging conditional effects over the covariate distribution in the target population, with uncertainty in that distribution quantified via a Bayesian bootstrap applied to external complex survey data.We examine performance in simulation designs previously used to evaluate IV estimators. We then apply the approach to employment among urban women in Nigeria and Senegal, finding strong and heterogeneous effects of contraceptive use. Sensitivity analyses suggest robustness to violations of assumptions for internal and external validity.

Journal Article

Share this book

Add to My Shelf

INFERENCE IN BAYESIAN ADDITIVE VECTOR AUTOREGRESSIVE TREE MODELS

by Rossini, Luca , Huber, Florian

2022

Vector autoregressive (VAR) models assume linearity between the endogenous variables and their lags. This assumption might be overly restrictive and could have a deleterious impact on forecasting accuracy. As a solution we propose combining VAR with Bayesian additive regression tree (BART) models. The resulting Bayesian additive vector autoregressive tree (BAVART) model is capable of capturing arbitrary nonlinear relations between the endogenous variables and the covariates without much input from the researcher. Since controlling for heteroscedasticity is key for producing precise density forecasts, our model allows for stochastic volatility in the errors. We apply our model to two datasets. The first application shows that the BAVART model yields highly competitive forecasts of the U.S. term structure of interest rates. In a second application we estimate our model using a moderately sized Eurozone dataset to investigate the dynamic effects of uncertainty on the economy.

Journal Article

Share this book

Add to My Shelf

Effects of interval treadmill training on spatiotemporal parameters in children with cerebral palsy: A machine learning approach

by Roge, Desiree , Bjornson, Kristie F. , Steele, Katherine M. in Accuracy , Adolescent , Bayes Theorem

2024

Quantifying individualized rehabilitation responses and optimizing therapy for each person is challenging. For interventions like treadmill training, there are multiple parameters, such as speed or incline, that can be adjusted throughout sessions. This study evaluates if causal modeling and Bayesian Additive Regression Trees (BART) can be used to accurately track the direct effects of treadmill training on gait. We developed a Directed Acyclic Graph (DAG) to specify the assumed relationship between training input parameters and spatiotemporal outcomes during Short Burst Locomotor Treadmill Training (SBLTT), a therapy designed specifically for children with cerebral palsy (CP). We evaluated outcomes after 24 sessions of SBLTT for simulated datasets of 150 virtual participants and experimental data from four children with CP, ages 4–13 years old. Individual BART models were created from treadmill data of each step. Simulated datasets demonstrated that BART could accurately identify specified responses to training, including strong correlations for step length progression (R2 = 0.73) and plateaus (R2 = 0.87). Model fit was stronger for participants with less step-to-step variability but did not impact model accuracy. For experimental data, participants’ step lengths increased by 26 ± 13% after 24 sessions. Using BART to control for speed or incline, we found that step length increased for three participants (direct effect: 13.5 ± 4.5%), while one participant decreased step length (−11.6%). SBLTT had minimal effects on step length asymmetry and step width. Tools such as BART can leverage step-by-step data collected during training for researchers and clinicians to monitor progression, optimize rehabilitation protocols, and inform the causal mechanisms driving individual responses.

Journal Article

Share this book

Add to My Shelf

A Bayesian Machine Learning Approach for Optimizing Dynamic Treatment Regimes

by Thall, Peter F. , Murray, Thomas A. , Yuan, Ying in Algorithms , Approximate dynamic programming , artificial intelligence

2018

Medical therapy often consists of multiple stages, with a treatment chosen by the physician at each stage based on the patient's history of treatments and clinical outcomes. These decisions can be formalized as a dynamic treatment regime. This article describes a new approach for optimizing dynamic treatment regimes, which bridges the gap between Bayesian inference and existing approaches, like Q-learning. The proposed approach fits a series of Bayesian regression models, one for each stage, in reverse sequential order. Each model uses as a response variable the remaining payoff assuming optimal actions are taken at subsequent stages, and as covariates the current history and relevant actions at that stage. The key difficulty is that the optimal decision rules at subsequent stages are unknown, and even if these decision rules were known the relevant response variables may be counterfactual. However, posterior distributions can be derived from the previously fitted regression models for the optimal decision rules and the counterfactual response variables under a particular set of rules. The proposed approach averages over these posterior distributions when fitting each regression model. An efficient sampling algorithm for estimation is presented, along with simulation studies that compare the proposed approach with Q-learning. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Regional variability in the associations between social and health-related risk factors and memory across Europe

by Wang, Huixia Savannah , Rieckmann, Anna , Josefsson, Maria in Activities of daily living , Aged , Aged, 80 and over

2026

Interventions targeting social and health-related risk factors are thought to reduce the risk of cognitive decline and dementia in older age. Despite well-known social, economic, and cultural differences across European countries, little is known about how these factors influence associations with memory function in different geographical contexts. This study examined the relationship between five social and health-related risk factors, namely living alone, physical inactivity, obesity, depression, and cardiometabolic and cardiovascular conditions, and memory function across Europe. Data came from the Survey of Health, Ageing, and Retirement in Europe (SHARE), a cross-national study of older adults. The sample included cross-sectional data for 102,851 adults aged 50-102 years from 20 European countries, grouped into four regions: Northern, Western, Eastern, and Southern Europe. Memory function was assessed using a sum score of immediate and delayed recall tests. A flexible Bayesian machine learning approach for multilevel data was applied to assess heterogeneity of associations in the total sample and in analyses stratified by education and age. All five social and health-related risk factors were negatively associated with memory overall, but the strength and, for some factors, the direction of these associations varied across regions. In particular, the associations for living alone, obesity, and physical inactivity differed between Eastern and Southern Europe compared with Northern and Western Europe. These findings highlight substantial geographical heterogeneity in the associations between social and health-related risk factors and memory, which should be considered when designing and implementing public health interventions.

Journal Article

Share this book

Add to My Shelf

Novel Bayesian Additive Regression Tree Methodology for Flood Susceptibility Modeling

by Vafakhah Mehdi , Dinan, Naghmeh Mobarghaee , Kapelan Zoran in Additives , Altitude , Bayesian analysis

2021

Identifying areas prone to flooding is a key step in flood risk management. The purpose of this study is to develop and present a novel flood susceptibility model based on Bayesian Additive Regression Tree (BART) methodology. The predictive performance of the new model is assessed via comparison with the Naïve Bayes (NB) and Random Forest (RF) based methods that were previously published in the literature. All models were tested on a real case study based in the Kan watershed in Iran. The following fifteen climatic and geo-environmental variables were used as inputs into all flood susceptibility models: altitude, aspect, slope, plan curvature, profile curvature, drainage density, distance from river distance from road, stream power index (SPI), topographic wetness index (TPI), topographic position index (TPI), curve number (CN), land use, lithology and rainfall. Based on the existing flood field survey and other information available for the analyzed area, a total of 118 flood locations were identified as potentially prone to flooding. The data available were divided into two groups with 70% used for training and 30% for validation of all models. The receiver operating characteristic (ROC) curve parameters were used to evaluate the predictive accuracy of the new and existing models. Based on the area under curve (AUC) the new BART (86%) model outperformed the NB (80%) and RF (85%) models. Regarding the importance of input variables, the results obtained showed that the location’s altitude and distance from the river are the most important variables for assessing flooding susceptibility.

Journal Article

Share this book

Add to My Shelf

A Bayesian Nonparametric Approach to Causal Inference on Quantiles

by Winterstein, Almut G. , Xu, Dandan , Daniels, Michael J. in Acute Kidney Injury - etiology , Bayes Theorem , Bayesian additive regression trees (BART)

2018

We propose a Bayesian nonparametric approach (BNP) for causal inference on quantiles in the presence of many confounders. In particular, we define relevant causal quantities and specify BNP models to avoid bias from restrictive parametric assumptions. We first use Bayesian additive regression trees (BART) to model the propensity score and then construct the distribution of potential outcomes given the propensity score using a Dirichlet process mixture (DPM) of normals model. We thoroughly evaluate the operating characteristics of our approach and compare it to Bayesian and frequentist competitors. We use our approach to answer an important clinical question involving acute kidney injury using electronic health records.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter